Search Results: "noodles"

20 February 2023

Jonathan McDowell: Fixing mobile viewing

It was brought to my attention recently that the mobile viewing experience of this blog was not exactly what I d hope for. In my poor defence I proof read on my desktop and the only time I see my posts on mobile is via FreshRSS. Also my UX ability sucks. Anyway. I ve updated the theme to a more recent version of minima and tried to make sure I haven t broken it all in the process (I did break tagging, but then I fixed it again). I double checked the generated feed to confirm it was the same (other than some re-tagging I did), so hopefully I haven t flooded anyone s feed. Hopefully I can go back to ignoring the underlying blog engine for another 5+ years. If not I ll have to take a closer look at Enrico s staticsite.

17 February 2023

Jonathan McDowell: First impressions of the VisionFive 2

VisionFive 2 packaging Back in September last year I chose to back the StarFive VisionFive 2 on Kickstarter. I don t have a particular use in mind for it, but I felt it was one of the first RISC-V systems that were relatively capable (mentally I have it as somewhere between a Raspberry Pi 3 + a Pi 4). In particular it s a quad 1.5GHz 64-bit RISC-V core with 8G RAM, USB3, GigE ethernet and a single M.2 PCIe slot. More than ample as a personal machine for playing around with RISC-V and doing local builds. I ended up paying 67 for the Early Bird variant (dual GigE ethernet rather than 1 x 100Mb and 1 x GigE). A couple of weeks ago I got an email with a tracking number and last week it finally turned up. Being impatient the first thing I did was plug it into a monitor, connect up a keyboard, and power it on. Nothing except some flashing lights. Looking at the boot selector DIP switches suggested it was configured to boot from UART, so I flipped them to (what I thought was) the flash setting. It wasn t - turns out the ON marking on the switches represents logic 0 and it was correctly setup when I got it. I went to read the documentation which talked about writing an image to a MicroSD card, but also had details of the UART connection. Wanting to make sure the device was at least doing something before I actually tried an OS on it I hooked up a USB/serial dongle and powered the board up again. Success! U-Boot appeared and I could interact with it. I went to the VisionFive2 Debian page and proceeded to torrent the Image-69 image, writing it to a MicroSD card and inserting it in the slot on the bottom of the board. It booted fine. I can t even tell you what graphical environment it booted up because I don t remember; it worked fine though (at 1080p, I ve seen reports that 4K screens will make it croak). Poking around the image revealed that it s built off a snapshot.debian.org snapshot from 20220616T194833Z, which is a little dated at this point but I understand the rationale behind picking something that works and sticking with it. The kernel is of course a vendor special, based on 5.15.0. Further investigation revealed that the entire X/graphics stack is living in /usr/local, which isn t overly surprising; it s Imagination based. I was pleasantly surprised to discover there is work to upstream the Imagination support, but I m not planning to run the board with a monitor attached so it s not a high priority for me. Having discovered all that I decided to see how well a clean Debian unstable install from Debian Ports would go. I had a spare Intel Optane lying around (it s a stupid 22110 M.2 which is too long for any machine I own), so I put it in the slot on the bottom of the board. To my surprise it Just Worked and was detected ok:
# lspci
0000:00:00.0 PCI bridge: PLDA XpressRich-AXI Ref Design (rev 02)
0000:01:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01)
0001:00:00.0 PCI bridge: PLDA XpressRich-AXI Ref Design (rev 02)
0001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [Optane]
I created a single partition with an ext4 filesystem (initially tried btrfs, but the StarFive kernel doesn t support it), and kicked off a debootstrap with:
# mkfs -t ext4 /dev/nvme0n1p1
# mount /dev/nvme0n1p1 /mnt
# debootstrap --keyring=/etc/apt/trusted.gpg.d/debian-ports-archive-2023.gpg \
	unstable /mnt https://deb.debian.org/debian-ports
The u-boot setup has a convoluted set of vendor scripts that eventually ends up reading a /boot/extlinux/extlinux.conf config from /dev/mmcblk1p2, so I added an additional entry there using the StarFive kernel but pointing to the NVMe device for /. Made sure to set a root password (not that I ve been bitten by that before, too many times), and rebooted. Success! Well. Sort of. I hit a bunch of problems with having a getty running on ttyS0 as well as one running on hvc0. The second turns out to be a console device from the RISC-V SBI. I did a systemctl mask serial-getty@hvc0.service which made things a bit happier, but I was still seeing odd behaviour and output. Turned out I needed to reboot the initramfs as well; the StarFive one was using Plymouth and doing some other stuff that seemed to be confusing matters. An update-initramfs -k 5.15.0-starfive -c built me a new one and everything was happy. Next problem; the StarFive kernel doesn t have IPv6 support. StarFive are good citizens and make their 5.15 kernel tree available, so I grabbed it, fed it the existing config, and tweaked some options (including adding IPV6 and SECCOMP, which chrony wanted). Slight hiccup when it turned out trying to do things like make sound modular caused it to fail to compile, and having to backport the fix that allowed the use of GCC 12 (as present in sid), but it got there. So I got cocky and tried to update it to the latest 5.15.94. A few manual merge fixups (which I may or may not have got right, but it compiles and boots for me), and success. Timings:
$ time make -j 4 bindeb-pkg
  [linux-image-5.15.94-00787-g1fbe8ac32aa8]
real	37m0.134s
user	117m27.392s
sys	6m49.804s
On the subject of kernels I am pleased to note that there are efforts to upstream the VisionFive 2 support, with what appears to be multiple members of StarFive engaging in multiple patch submission rounds. It s really great to see this and I look forward to being able to run an unmodified mainline kernel on my board. Niggles? I have a few. The provided u-boot doesn t have NVMe support enabled, so at present I need to keep a MicroSD card to boot off, even though root is on an SSD. I m also seeing some errors in dmesg from the SSD:
[155933.434038] nvme nvme0: I/O 436 QID 4 timeout, completion polled
[156173.351166] nvme nvme0: I/O 48 QID 3 timeout, completion polled
[156346.228993] nvme nvme0: I/O 108 QID 3 timeout, completion polled
It doesn t seem to cause any actual issues, and it could be the SSD, the 5.15 kernel or an actual hardware thing - I ll keep an eye on it (I will probably end up with a different SSD that actually fits, so that ll provide another data point). More annoying is the temperature the CPU seems to run at. There s no heatsink or fan, just the metal heatspreader on top of the CPU, and in normal idle operation it sits at around 60 C. Compiling a kernel it hit 90 C before I stopped the job and sorted out some additional cooling in the form of a desk fan, which kept it as just over 30 C. Bare VisionFive 2 SBC board with a small desk fan pointed at it I haven t seen any actual stability problems, but I wouldn t want to run for any length of time like that. I ve ordered a heatsink and also realised that the board supports a Raspberry Pi style PoE Hat , so I ve got one of those that includes a fan ordered (I am a complete convert to PoE especially for small systems like this). With the desk fan setup I ve been able to run the board for extended periods under load (I did a full recompile of the Debian 6.1.12-1 kernel package and it took about 10 hours). The M.2 slot is unfortunately only a single PCIe v2 lane, and my testing topped out at about 180MB/s. IIRC that is about half what the slot should be capable of, and less than a 10th of what the SSD can do. Ethernet testing with iPerf3 sustained about 941Mb/s, so basically maxing out the port. The board as a whole isn t going to set any speed records, but it s perfectly usable, and pretty impressive for the price point. On the Debian side I ve not hit any surprises. There s work going on to move RISC-V to a proper release architecture, and I m hoping to be able to help out with that, but the version of unstable I installed from the ports infrastructure has looked just like any other Debian install. Which is what you want. And that pretty much sums up my overall experience of the VisionFive 2; it s not noticeably different than any other single board computer. That s a good thing, FWIW, and once the kernel support lands properly upstream (it ll be post 6.3 at least it seems) it ll be a boring mainline supported platform that just happens to be RISC-V.

9 February 2023

Jonathan McDowell: Building a read-only Debian root setup: Part 2

This is the second part of how I build a read-only root setup for my router. You might want to read part 1 first, which covers the initial boot and general overview of how I tie the pieces together. This post will describe how I build the squashfs image that forms the main filesystem. Most of the build is driven from a script, make-router, which I ll dissect below. It s highly tailored to my needs, and this is a fairly lengthy post, but hopefully the steps I describe prove useful to anyone trying to do something similar.
Breakdown of make-router
#!/bin/bash
# Either rb3011 (arm) or rb5009 (arm64)
#HOSTNAME="rb3011"
HOSTNAME="rb5009"
if [ "x$ HOSTNAME " == "xrb3011" ]; then
	ARCH=armhf
elif [ "x$ HOSTNAME " == "xrb5009" ]; then
	ARCH=arm64
else
	echo "Unknown host: $ HOSTNAME "
	exit 1
fi

It s a bash script, and I allow building for either my RB3011 or RB5009, which means a different architecture (32 vs 64 bit). I run this script on my Pi 4 which means I don t have to mess about with QemuUserEmulation.
BASE_DIR=$(dirname $0)
IMAGE_FILE=$(mktemp --tmpdir router.$ ARCH .XXXXXXXXXX.img)
MOUNT_POINT=$(mktemp -p /mnt -d router.$ ARCH .XXXXXXXXXX)
# Build and mount an ext4 image file to put the root file system in
dd if=/dev/zero bs=1 count=0 seek=1G of=$ IMAGE_FILE 
mkfs -t ext4 $ IMAGE_FILE 
mount -o loop $ IMAGE_FILE  $ MOUNT_POINT 

I build the image in a loopback ext4 file on tmpfs (my Pi4 is the 8G model), which makes things a bit faster.
# Add dpkg excludes
mkdir -p $ MOUNT_POINT /etc/dpkg/dpkg.cfg.d/
cat <<EOF > $ MOUNT_POINT /etc/dpkg/dpkg.cfg.d/path-excludes
# Exclude docs
path-exclude=/usr/share/doc/*
# Only locale we want is English
path-exclude=/usr/share/locale/*
path-include=/usr/share/locale/en*/*
path-include=/usr/share/locale/locale.alias
# No man pages
path-exclude=/usr/share/man/*
EOF

Create a dpkg excludes config to drop docs, man pages and most locales before we even start the bootstrap.
# Setup fstab + mtab
echo "# Empty fstab as root is pre-mounted" > $ MOUNT_POINT /etc/fstab
ln -s ../proc/self/mounts $ MOUNT_POINT /etc/mtab
# Setup hostname
echo $ HOSTNAME  > $ MOUNT_POINT /etc/hostname
# Add the root SSH keys
mkdir -p $ MOUNT_POINT /root/.ssh/
cat <<EOF > $ MOUNT_POINT /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv8NkUeVdsVdegS+JT9qwFwiHEgcC9sBwnv6RjpH6I4d3im4LOaPOatzneMTZlH8Gird+H4nzluciBr63hxmcFjZVW7dl6mxlNX2t/wKvV0loxtEmHMoI7VMCnrWD0PyvwJ8qqNu9cANoYriZRhRCsBi27qPNvI741zEpXN8QQs7D3sfe4GSft9yQplfJkSldN+2qJHvd0AHKxRdD+XTxv1Ot26+ZoF3MJ9MqtK+FS+fD9/ESLxMlOpHD7ltvCRol3u7YoaUo2HJ+u31l0uwPZTqkPNS9fkmeCYEE0oXlwvUTLIbMnLbc7NKiLgniG8XaT0RYHtOnoc2l2UnTvH5qsQ== noodles@earth.li
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDQb9+qFemcwKhey3+eTh5lxp+3sgZXW2HQQEZMt9hPvVXk+MiiNMx9WUzxPJnwXqlmmVdKsq+AvjA0i505Pp8fIj5DdUBpSqpLghmzpnGuob7SSwXYj+352hjD52UC4S0KMKbIaUpklADgsCbtzhYYc4WoO8F7kK63tS5qa1XSZwwRwPbYOWBcNocfr9oXCVWD9ismO8Y0l75G6EyW8UmwYAohDaV83pvJxQerYyYXBGZGY8FNjqVoOGMRBTUcLj/QTo0CDQvMtsEoWeCd0xKLZ3gjiH3UrknkaPra557/TWymQ8Oh15aPFTr5FvKgAlmZaaM0tP71SOGmx7GpCsP4jZD1Xj/7QMTAkLXb+Ou6yUOVM9J4qebdnmF2RGbf1bwo7xSIX6gAYaYgdnppuxqZX1wyAy+A2Hie4tUjMHKJ6OoFwBsV1sl+3FobrPn6IuulRCzsq2aLqLey+PHxuNAYdSKo7nIDB3qCCPwHlDK52WooSuuMidX4ujTUw7LDTia9FxAawudblxbrvfTbg3DsiDBAOAIdBV37HOAKu3VmvYSPyqT80DEy8KFmUpCEau59DID9VERkG6PWPVMiQnqgW2Agn1miOBZeIQV8PFjenAySxjzrNfb4VY/i/kK9nIhXn92CAu4nl6D+VUlw+IpQ8PZlWlvVxAtLonpjxr9OTw== noodles@yubikey
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0I8UHj4IpfqUcGE4cTvLB0d2xmATSUzqtxW6ZhGbZxvQDKJesVW6HunrJ4NFTQuQJYgOXY/o82qBpkEKqaJMEFHTCjcaj3M6DIaxpiRfQfs0nhtzDB6zPiZn9Suxb0s5Qr4sTWd6iI9da72z3hp9QHNAu4vpa4MSNE+al3UfUisUf4l8TaBYKwQcduCE0z2n2FTi3QzmlkOgH4MgyqBBEaqx1tq7Zcln0P0TYZXFtrxVyoqBBIoIEqYxmFIQP887W50wQka95dBGqjtV+d8IbrQ4pB55qTxMd91L+F8n8A6nhQe7DckjS0Xdla52b9RXNXoobhtvx9K2prisagsHT noodles@cup
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBK6iGog3WbNhrmrkglNjVO8/B6m7mN6q1tMm1sXjLxQa+F86ETTLiXNeFQVKCHYrk8f7hK0d2uxwgj6Ixy9k0Cw= noodles@sevai
EOF

Setup fstab, the hostname and SSH keys for root.
# Bootstrap our install
debootstrap \
	--arch=$ ARCH  \
	--include=collectd-core,conntrack,dnsmasq,ethtool,iperf3,kexec-tools,mosquitto,mtd-utils,mtr-tiny,ppp,tcpdump,rng-tools5,ssh,watchdog,wget \
	--exclude=dmidecode,isc-dhcp-client,isc-dhcp-common,makedev,nano \
	bullseye $ MOUNT_POINT  https://deb.debian.org/debian/

Actually do the debootstrap step, including a bunch of extra packages that we want.
# Install mqtt-arp
cp $ BASE_DIR /debs/mqtt-arp_1_$ ARCH .deb $ MOUNT_POINT /tmp
chroot $ MOUNT_POINT  dpkg -i /tmp/mqtt-arp_1_$ ARCH .deb
rm $ MOUNT_POINT /tmp/mqtt-arp_1_$ ARCH .deb
# Frob the mqtt-arp config so it starts after mosquitto
sed -i -e 's/After=.*/After=mosquitto.service/' $ MOUNT_POINT /lib/systemd/system/mqtt-arp.service

I haven t uploaded mqtt-arp to Debian, so I install a locally built package, and ensure it starts after mosquitto (the MQTT broker), given they re running on the same host.
# Frob watchdog so it starts earlier than multi-user
sed -i -e 's/After=.*/After=basic.target/' $ MOUNT_POINT /lib/systemd/system/watchdog.service
# Make sure the watchdog is poking the device file
sed -i -e 's/^#watchdog-device/watchdog-device/' $ MOUNT_POINT /etc/watchdog.conf

watchdog timeouts were particularly an issue on the RB3011, where the default timeout didn t give enough time to reach multiuser mode before it would reset the router. Not helpful, so alter the config to start it earlier (and make sure it s configured to actually kick the device file).
# Clean up docs + locales
rm -r $ MOUNT_POINT /usr/share/doc/*
rm -r $ MOUNT_POINT /usr/share/man/*
for dir in $ MOUNT_POINT /usr/share/locale/*/; do
	if [ "$ dir " != "$ MOUNT_POINT /usr/share/locale/en/" ]; then
		rm -r $ dir 
	fi
done

Clean up any docs etc that ended up installed.
# Set root password to root
echo "root:root"   chroot $ MOUNT_POINT  chpasswd

The only login method is ssh key to the root account though I suppose this allows for someone to execute a privilege escalation from a daemon user so I should probably randomise this. Does need to be known though so it s possible to login via the serial console for debugging.
# Add security to sources.list + update
echo "deb https://security.debian.org/debian-security bullseye-security main" >> $ MOUNT_POINT /etc/apt/sources.list
chroot $ MOUNT_POINT  apt update
chroot $ MOUNT_POINT  apt -y full-upgrade
chroot $ MOUNT_POINT  apt clean
# Cleanup the APT lists
rm $ MOUNT_POINT /var/lib/apt/lists/www.*
rm $ MOUNT_POINT /var/lib/apt/lists/security.*

Pull in any security updates, then clean out the APT lists rather than polluting the image with them.
# Disable the daily APT timer
rm $ MOUNT_POINT /etc/systemd/system/timers.target.wants/apt-daily.timer
# Disable daily dpkg backup
cat <<EOF > $ MOUNT_POINT /etc/cron.daily/dpkg
#!/bin/sh
# Don't do the daily dpkg backup
exit 0
EOF
# We don't want a persistent systemd journal
rmdir $ MOUNT_POINT /var/log/journal

None of these make sense on a router.
# Enable nftables
ln -s /lib/systemd/system/nftables.service \
	$ MOUNT_POINT /etc/systemd/system/sysinit.target.wants/nftables.service

Ensure we have firewalling enabled automatically.
# Add systemd-coredump + systemd-timesync user / group
echo "systemd-timesync:x:998:" >> $ MOUNT_POINT /etc/group
echo "systemd-coredump:x:999:" >> $ MOUNT_POINT /etc/group
echo "systemd-timesync:!*::" >> $ MOUNT_POINT /etc/gshadow
echo "systemd-coredump:!*::" >> $ MOUNT_POINT /etc/gshadow
echo "systemd-timesync:x:998:998:systemd Time Synchronization:/:/usr/sbin/nologin" >> $ MOUNT_POINT /etc/passwd
echo "systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin" >> $ MOUNT_POINT /etc/passwd
echo "systemd-timesync:!*:47358::::::" >> $ MOUNT_POINT /etc/shadow
echo "systemd-coredump:!*:47358::::::" >> $ MOUNT_POINT /etc/shadow
# Create /etc/.pwd.lock, otherwise it'll end up in the overlay
touch $ MOUNT_POINT /etc/.pwd.lock
chmod 600 $ MOUNT_POINT /etc/.pwd.lock

Create a number of users that will otherwise get created at boot, and a lock file that will otherwise get created anyway.
# Copy config files
cp --recursive --preserve=mode,timestamps $ BASE_DIR /etc/* $ MOUNT_POINT /etc/
cp --recursive --preserve=mode,timestamps $ BASE_DIR /etc-$ ARCH /* $ MOUNT_POINT /etc/
chroot $ MOUNT_POINT  chown mosquitto /etc/mosquitto/mosquitto.users
chroot $ MOUNT_POINT  chown mosquitto /etc/ssl/mqtt.home.key

There are config files that are easier to replace wholesale, some of which are specific to the hardware (e.g. related to network interfaces). See below for some more details.
# Build symlinks into flash for boot / modules
ln -s /mnt/flash/lib/modules $ MOUNT_POINT /lib/modules
rmdir $ MOUNT_POINT /boot
ln -s /mnt/flash/boot $ MOUNT_POINT /boot

The kernel + its modules live outside the squashfs image, on the USB flash drive that the image lives on. That makes for easier kernel upgrades.
# Put our git revision into os-release
echo -n "GIT_VERSION=" >> $ MOUNT_POINT /etc/os-release
(cd $ BASE_DIR  ; git describe --tags) >> $ MOUNT_POINT /etc/os-release

Always helpful to be able to check the image itself for what it was built from.
# Add some stuff to root's .bashrc
cat << EOF >> $ MOUNT_POINT /root/.bashrc
alias ls='ls -F --color=auto'
eval "\$(dircolors)"
case "\$TERM" in
xterm* rxvt*)
	PS1="\\[\\e]0;\\u@\\h: \\w\a\\]\$PS1"
	;;
*)
	;;
esac
EOF

Just some niceties for when I do end up logging in.
# Build the squashfs
mksquashfs $ MOUNT_POINT  /tmp/router.$ ARCH .squashfs \
	-comp xz

Actually build the squashfs image.
# Save the installed package list off
chroot $ MOUNT_POINT  dpkg --get-selections > /tmp/wip-installed-packages

Save off the installed package list. This was particularly useful when trying to replicate the existing router setup and making sure I had all the important packages installed. It doesn t really serve a purpose now.
In terms of the config files I copy into /etc, shared across both routers are the following:
Breakdown of shared config
  • apt config (disable recommends, periodic updates):
    • apt/apt.conf.d/10periodic, apt/apt.conf.d/local-recommends
  • Adding a default, empty, locale:
    • default/locale
  • DNS/DHCP:
    • dnsmasq.conf, dnsmasq.d/dhcp-ranges, dnsmasq.d/static-ips
    • hosts, resolv.conf
  • Enabling IP forwarding:
    • sysctl.conf
  • Logs related:
    • logrotate.conf, rsyslog.conf
  • MQTT related:
    • mosquitto/mosquitto.users, mosquitto/conf.d/ssl.conf, mosquitto/conf.d/users.conf, mosquitto/mosquitto.acl, mosquitto/mosquitto.conf
    • mqtt-arp.conf
    • ssl/lets-encrypt-r3.crt, ssl/mqtt.home.key, ssl/mqtt.home.crt
  • PPP configuration:
    • ppp/ip-up.d/0000usepeerdns, ppp/ipv6-up.d/defaultroute, ppp/pap-secrets, ppp/chap-secrets
    • network/interfaces.d/pppoe-wan
The router specific config is mostly related to networking:
Breakdown of router specific config
  • Firewalling:
    • nftables.conf
  • Interfaces:
    • dnsmasq.d/interfaces
    • network/interfaces.d/eth0, network/interfaces.d/p1, network/interfaces.d/p2, network/interfaces.d/p7, network/interfaces.d/p8
  • PPP config (network interface piece):
    • ppp/peers/aquiss
  • SSH keys:
    • ssh/ssh_host_ecdsa_key, ssh/ssh_host_ed25519_key, ssh/ssh_host_rsa_key, ssh/ssh_host_ecdsa_key.pub, ssh/ssh_host_ed25519_key.pub, ssh/ssh_host_rsa_key.pub
  • Monitoring:
    • collectd/collectd.conf, collectd/collectd.conf.d/network.conf

31 January 2023

Jonathan McDowell: Enabling retrogaming with Kodi on Debian

For some reason my son has started to be really into watching playthroughs of Mario and similar games on Youtube. I don t understand the appeal, but it s less distracting as background than Paw Patrol, so I m not complaining. He s not quite at the stage he s ready to play the games himself, but it s coming. So I figured it would be neat to sort out some retrogaming bits ready for when that happens. I already have a Kodi box underneath the TV; it doesn t get as much use these days as a lot of our viewing is through commercial streaming services, but it s got all of our DVDs ripped so is still useful. Recent version of Kodi have support for games as well, so I decided it would be perfect if I could tie in to that. However. The normal ways of doing this seems to be to download someone s pre-rolled setup, and I d much rather be able to get the bits I need from Debian, as that s what the machine is running (it does a few minor things other than Kodi). The best retrogaming environment out there seems to be RetroArch. It s available for Linux/OS X/Windows and RetroPie provides a nice easy standalone setup if you re not interested in the Kodi side. If you are then game.libretro provides a wrapper for libretro cores under Kodi. This seemed like the right track. Unfortunately RetroArch and related packages were in need of some love in Debian. So I ended up engaging in some yak shaving to try and get to where I wanted to be. First up was RetroArch itself, which was over 4 years out of date, at 1.7.3. It turned out that wanted an updated assets package which contains the necessary icons etc for the interface. RetroArch is only a frontend. To actually play games you need a suitable core. The first one I tried was genesisplusgx (I have fond memories of Sonic from the Master System era), which again was several years out of date. I pulled recent git (I wish folk would tag releases at least every now and then) and updated things. And successfully managed to play Sonic (badly, I am way out of practice). genesisplusgx is in non-free, due to a prohibition on commercial distribution. So it s not actually part of Debian. I switched my attention to libretro-bsnes-mercury, which would then allow SNES emulation and is part of main. Again, not to hard to update, some packaging cleanups, and I was playing Super Mario. Again, badly. That meant I knew I had working emulation with libretro cores. It was time to integrate with Kodi. That meant taking game.libretro, filing an ITP and doing a bunch of bits to get it ready to upload (including introducing a retroarch-dev binary package that contains the appropriate include files as part of retroarch). It sat in NEW for a while (including an initial reject because I d missed an attribution in the debian/copyright), and was accepted yesterday. There s a final piece of the puzzle, and that s the Kodi config that ties together the libretro core with game.libretro and presents the emulator to Kodi as a fully fledged add-on. The kodi-game folk have a neat tool, kodi-game-scripting which automates the heavy lifting of producing this config. I ve done some local modifications that make it bit more useful for producing config that can be embedded in the Debian libretro-* packages directly, which I should upload somewhere but it s all a bit rough n ready at present. It was enough to allow me to produce some kodi-game-libretro-bsnes-* packages as part of libretro-bsnes-mercury. With that complete, all the packages needed for playing SNES games under Kodi are now present in Debian. I need to upload libretro-bsnes-mercury to unstable (it went to experimental while waiting for kodi-game-libretro to be accepted), and kodi-game-libretro needs another source-only upload, but once that s done both should be in good shape to migrate to testing and be part of the upcoming bookworm release. What else is there to do? I d like to get Kodi config included in the other libretro packages that are already part of Debian. That s going to need the Controller Topology Project to be packaged so that the controller details are available (I was lucky in that the SNES controller is already part of the Kodi package). I need to work out if I can turn kodi-game-scripting into some sort of dh helper to help automate things. But I ve done some local testing with genesisplusgx and it works fine as expected. The other thing is that games are not yet first class citizens in Kodi; the normal browser interface you get for movies, music and TV shows is not available for games. Currently I ve been trying out the ROM Collection Browser though I find its automated scraping isn t as good as I d like. A friend has recommended the Advanced Emulator Launcher but I haven t taken a look at it. Either way I d like to ultimately get one of them packaged up as well, though not in time for bookworm. Anyway. My hope is that these updated and new packages prove useful to someone else. You can tell I m biased towards 90s era consoles, but if you ve enough CPU grunt there are a bunch of more recent cores available too. Big thanks to the Debian FTP Master team for letting these through NEW so close to release. And all the upstream devs - RetroArch is a great framework from my perspective as a user, and the Kodi Game folk have done massive amounts of work that made my life much easier when preparing things for Debian.

12 January 2023

Jonathan McDowell: Building a read-only Debian root setup: Part 1

I mentioned in the post about upgrading my home internet that part of the work I did was creating a read-only Debian root with a squashfs image. This post covers the details of how I boot with that image; a later post will cover how I build the squashfs image. First, David Reader kindly pointed me at his rodebian setup, which was helpful in making me think about the whole problem but ultimately not the direction I went. Primarily because on the old router (an RB3011) I am space constrained, with only 120M of usable flash, and so ideally I wanted as much as possible of the system in a well compressed filesystem. squashfs seemed like the best option for that, and ultimately I ended up with a 39M image. I ve then used overlayfs to mount a tmpfs, so I get what looks like a writeable system without having to do too many tweaks to the actual install. On the plus side I can then see exactly what is getting written where and decide whether I need to update something in the squashfs. I don t boot with an initrd - for initial testing I booted directly off a USB stick. I ve actually ended up continuing to do this in production, because I ve had no pressing reason to move it all to booting off internal flash (I ve ended up with a Sandisk SDCZ430-032G-G46 which is tiny). However nothing I m going to describe is dependent on that - this would work perfectly well for a initial UBIFS rootfs on internal NAND. So the basic overview is I boot off a minimal rootfs, mount a squashfs, create an appropriate tmpfs, mount an overlayfs that combines the two, then pivotroot into the overlayfs and exec its init so it becomes the rootfs. For the minimal rootfs I started with busybox, in particular I used the armhf busybox-static package from Debian. My RB5009 is an ARM64, but I wanted to be able to test on the RB3011 as well, which is ARMv7. Picking an armhf binary for the minimal rootfs lets me use the same image for both. Using the static build helps reduce the number of pieces involved in putting it all together. The busybox binary goes in /bin. I was able to cheat and chroot into the empty rootfs and call busybox --install -s to create symlinks for all the tools it provides, but I could have done this manually. There s only a handful that are actually needed, but it s amazing how much is crammed into a 1.2M binary. /sbin/init is a shell script:
Contents
#!/bin/ash
# Make sure we have a sane date
if [ -e /data/saved-date ]; then
        CURRENT_DATE=$(date -Iseconds)
        if [ "$ CURRENT_DATE:0:4 " -lt "2022" -o \
                        "$ CURRENT_DATE:0:4 " -gt "2030" ]; then
                echo Setting initial date
                date -s "$(cat /data/saved-date)"
        fi
fi
# Work out what platform we're on
ARCH=$(uname -m)
if [ "$ ARCH " == "aarch64" ]; then
        ARCH=arm64
else
        ARCH=armhf
fi
# Mount a tmpfs to store the changes
mount -t tmpfs root-rw /mnt/overlay/rw
# Make the directories we need in the tmpfs
mkdir /mnt/overlay/rw/upper
mkdir /mnt/overlay/rw/work
# Mount the squashfs and build an overlay root filesystem of it + the tmpfs
mount -t squashfs -o loop /data/router.$ ARCH .squashfs /mnt/overlay/lower
mount -t overlay \
        -o lowerdir=/mnt/overlay/lower,upperdir=/mnt/overlay/rw/upper,workdir=/mnt/overlay/rw/work \
        overlayfs-root /mnt/root
# Build the directories we need within the new root
mkdir /mnt/root/mnt/flash
mkdir /mnt/root/mnt/overlay
mkdir /mnt/root/mnt/overlay/lower
mkdir /mnt/root/mnt/overlay/rw
# Copy any stored state
if [ -e /data/state.$ ARCH .tar ]; then
        echo Restoring stored state
        cd /mnt/root
        tar xf /data/state.$ ARCH .tar
fi
cd /mnt/root
pivot_root . mnt/flash
echo Switching into root filesystem
exec chroot . sh -c "$(cat <<END
mount --move /mnt/flash/mnt/overlay/lower /mnt/overlay/lower
mount --move /mnt/flash/mnt/overlay/rw /mnt/overlay/rw
exec /sbin/init
END
)"
Most of what the script is doing is sorting out the squashfs + tmpfs backed overlayfs that becomes the full root filesystems, but there are a few other bits to note. First, we pick up a saved date from /data/saved-date - the router has no RTC and while it ll sort itself out with NTP once it gets networking up it s useful to make sure we don t end up comically far in the past or future. Second, the script looks at what architecture we re running and picks up an appropriate squashfs image from /data based on that. This let me use the same USB stick for testing on both the RB3011 and the RB5011. Finally we allow for a /data/state.$ ARCH .tar file to let us pick up changes to the rootfs at boot time - this prevents having to rebuild the squashfs image every time there s a persistent change. The other piece that doesn t show up in the script is that the kernel and its modules are all installed into this initial rootfs (and then symlinked from the squashfs). This lets me build a mostly modular kernel, as long as all the necessary drivers to mount the USB stick are built in. Once the system is fully booted the initial rootfs is available at /mnt/flash, by default mounted read-only (to avoid inadvertent writes), but able to be remounted to update the squashfs image, install a new kernel, or update the state tarball. /mnt/overlay/rw/upper/ is where updates to the overlayfs are written, which provides an easy way to see what files are changing, initially to determine what might need tweaked in the squashfs creation process and subsequently to be able to see what needs updated in the state tarball.

6 January 2023

Jonathan McDowell: Finally making use of bpftrace

I am old enough to remember when BPF meant the traditional Berkeley Packet Filter, and was confined to filtering network packets. It s grown into much, much, more as eBPF and getting familiar with it so that I can add it to the suite of tips and tricks I can call upon has been on my to-do list for a while. To this end I was lucky enough to attend a live walk through of bpftrace last year. bpftrace is a high level tool that allows the easy creation and execution of eBPF tracers under Linux. Recently I ve been working on updating the RetroArch packages in Debian and as I was doing so I realised there was a need to update the quite outdated retroarch-assets package, which contains various icons and images used for the user interface. I wanted to try and re-generate as many of the artefacts as I could, to ensure the proper source was available. However it wasn t always clear which files were actually needed and which were either source or legacy. So I wanted to trace file opens by retroarch and see when it was failing to find files. Traditionally this is something I d have used strace for, but it seemed like a great opportunity to try out bpftrace. It turns out bpftrace ships with an example, opensnoop.bt which provided details of hooking the open syscall entry + exit and providing details of all files opened on the system. I only wanted to track opens by the retroarch binary that failed, so I made a couple of modifications:
retro-failed-open-snoop.bt
#!/usr/bin/env bpftrace
/*
 * retro-failed-open-snoop - snoop failed opens by RetroArch
 *
 * Based on:
 * opensnoop	Trace open() syscalls.
 *		For Linux, uses bpftrace and eBPF.
 *
 * Copyright 2018 Netflix, Inc.
 * Licensed under the Apache License, Version 2.0 (the "License")
 *
 * 08-Sep-2018	Brendan Gregg	Created this.
 */
BEGIN
 
	printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
	printf("%-6s %-16s %3s %s\n", "PID", "COMM", "ERR", "PATH");
 
tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
 
	@filename[tid] = args->filename;
 
tracepoint:syscalls:sys_exit_open,
tracepoint:syscalls:sys_exit_openat
/@filename[tid]/
 
	$ret = args->ret;
	$errno = $ret > 0 ? 0 : - $ret;
	if (($ret <= 0) && (strncmp("retroarch", comm, 9) == 0) )  
		printf("%-6d %-16s %3d %s\n", pid, comm, $errno,
		    str(@filename[tid]));
	 
	delete(@filename[tid]);
 
END
 
	clear(@filename);
 
I had to install bpftrace (apt install bpftrace) and then I ran bpftrace -o retro.log retro-failed-open-snoop.bt as root and fired up retroarch as a normal user.
bpftrace failed open log for retroarch
Attaching 6 probes...
Tracing open syscalls... Hit Ctrl-C to end.
PID    COMM             ERR PATH
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/glibc-hwcaps/x86-64-v2/lib
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/x86_64/x86_64/libpulse
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/x86_64/libpulsecommon-
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/x86_64/libpulsecommon-
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/libpulsecommon-16.1.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/x86_64/x86_64/libpulsecomm
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/x86_64/libpulsecommon-16.1
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/x86_64/libpulsecommon-16.1
3394   retroarch          2 /etc/gcrypt/hwf.deny
3394   retroarch          2 /lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libgamemode.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/libgamemode.so.0
3394   retroarch          2 /lib/glibc-hwcaps/x86-64-v2/libgamemode.so.0
3394   retroarch          2 /lib/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/tls/libgamemode.so.0
3394   retroarch          2 /lib/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/libgamemode.so.0
3394   retroarch          2 /usr/lib/glibc-hwcaps/x86-64-v2/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/libgamemode.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/libgamemode.so
3394   retroarch          2 /lib/libgamemode.so
3394   retroarch          2 /usr/lib/libgamemode.so
3394   retroarch          2 /lib/x86_64-linux-gnu/libdecor-0.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/libdecor-0.so
3394   retroarch          2 /lib/libdecor-0.so
3394   retroarch          2 /usr/lib/libdecor-0.so
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/dri/tls/iris_dri.so
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/glibc-hwcaps/x86-64-v2/libedit.so.
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/x86_64/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/x86_64/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/libedit.so.2
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /home/noodles/.Xdefaults-udon
3394   retroarch          2 /home/noodles/.icons/default/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/default/index.theme
3394   retroarch          2 /usr/share/icons/default/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/default/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/index.theme
3394   retroarch          2 /usr/share/icons/Adwaita/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/Adwaita/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/index.theme
3394   retroarch          2 /usr/share/icons/hicolor/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/cursors/0000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/index.theme
3394   retroarch          2 /home/noodles/.XCompose
3394   retroarch          2 /home/noodles/.icons/default/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/default/index.theme
3394   retroarch          2 /usr/share/icons/default/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/default/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/index.theme
3394   retroarch          2 /usr/share/icons/Adwaita/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/Adwaita/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/index.theme
3394   retroarch          2 /usr/share/icons/hicolor/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/cursors/0000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/index.theme
3394   retroarch          2 /usr/share/libretro/assets/xmb/monochrome/png/disc.png
3394   retroarch          2 /usr/share/libretro/assets/xmb/monochrome/sounds
3394   retroarch          2 /usr/share/libretro/assets/sounds
3394   retroarch          2 /sys/class/power_supply/ACAD
3394   retroarch          2 /sys/class/power_supply/ACAD
3394   retroarch          2 /usr/share/libretro/assets/xmb/monochrome/png/disc.png
3394   retroarch          2 /usr/share/libretro/assets/ozone/sounds
3394   retroarch          2 /usr/share/libretro/assets/sounds
This was incredibly useful - the only theme image I was missing is disc.png from XMB Monochrome (which fails to have SVG source). I also discovered the runtime optional loading of GameMode. This is available in Debian so it was a simple matter to add libgamemode0 to the binary package Recommends. So, a very basic example of using bpftrace, but a remarkably useful intro to it from my point of view!

2 January 2023

Gunnar Wolf: Refueling the blog

So, it s this weird time of year where we make a balance and share with the world some ideas about the future. And yes, it s time to take care of this blog, as its activity has dropped once again. So maybe it d be nice to start this post by checking how much have I blogged over the years: (yes, this is an old blog) So yes, there is a clear downwards trend towards the last few years. And it does make sense, all in all: Not only have I managed to keep myself busier than before, but Blogging is a social endeavor. And as people have moved over to the different flavors of social networks, there is somewhat less fueling us to share our thoughts and experiences in this fashion. So this connects me to my first point: Staring at Noodles Emptiness, I got to a campaign to Bring Back Blogging. I stand by all of what they suggest: Blogs are a great invention, they allow the sharing of a great insight into a person s mind, ideas and worldview (and even more so if, like mine, it shows already a window of almost two decades of life! This year my blog will be old enough to vote!), they are completely decentralized, and can be easily grouped according to each readers preferences via the RSS format. Anyway I do want to write a post summing up 2022, as well as sharing some hopes and projects I have for 2023. But I don t want to make it too long to read. So That shall be the blog post for today! (post caption image by Walt Stoneburner on Flickr; CC BY 2.0)

1 January 2023

Jonathan McDowell: Free Software Activities for 2022

There is a move to Bring Back Blogging and having recently sorted out my own FreshRSS install I am completely in favour of such a thing. RSS feeds with complete posts, for preference, not just a teaser intro sentence/paragraph. It s also a reminder to me that I should blog more, and what better way to start 2023 than with my traditional recap of my Free Software activities in 2022. For previous years see 2019, 2020 + 2021

Conferences I attended DebConf22 in Prizen, Kosova this year, and finally hit the end of my luck in avoiding COVID. 0/10, would not recommend. Thankfully I didn t come down with symptoms until I got home (I felt fine and tested negative on arrival home, then started to feel terrible the next day and tested again), so I was able to enjoy the conference itself. I also made it to Linux Security Summit Europe 2022, which aligned with work related bits and was interesting. I suspect I would have been better going to LPC 2022 for the hallway track, though I did manage to get some overlap with folk being in town given that both were the same week.

Debian Most of my contributions to Free software continue to happen within Debian. We continue to operate a roughly 3 month rotation for Debian Keyring in terms of handling the regular updates, and I dealt with 2022.03.24, 2022.06.26, 2022.08.11, 2022.09.24, 2022.09.25 + 2022.12.24. There were a few out of cycle updates this year and I handled a couple of them. My other contributions are largely within the Debian Electronics Packaging Team. gcc-xtensa-lx106 saw a few updates, to GCC 11 + enabling D (10 + 11), then to GCC 12 (12). binutils-xtensa-lx106 got some minor packaging cleanups, which also served to force a rebuild with the current binutils source (5). libsigrokdecode got an upload to enable building with Python 3.10 (0.5.3-3). Related, I updated sdcc to a new upstream version (4.2.0+dfsg-1) - it s used for the sigrok-firmware-fx2lafw package and I do have a tendency to play with microcontrollers, so it s good to have a recent version available in the archive. I continue to pay attention to OpenOCD, with a minor set of updates to pull in some fixes from master (0.11.0-2). I was pleased to see the release process for 0.12.0 kick off and have been uploading RCs as they come out (0.12.0~rc1-1, 0.12.0~rc2-1 + 0.12.0~rc3-1). Upstream have been interested in the upcoming bookworm release cycle and I m hopeful we ll get 0.12.0 proper in before freeze. libjaylink also saw an upstream release (0.3.1-1). Package upload sponsorship isn t normally something I get involved with, because I find I have to spend a lot of time checking over things before I m comfortable doing the upload. However I did sponsor an initial upload for sugarjar and an update for mgba (0.10.0+dsfg-1, currently stuck in NEW). Credit to Michel for dealing swiftly with my review comments, and Ryan for producing a nicely reviewable set of changes. As part of the Data Protection Team I responded to various inbound queries to that team. There was also some discussion on debian-vote as part of the DPL election that I engaged with, as well as discussions at DebConf about how we can do things better. For Debian New Members I m mostly inactive as an application manager - we generally seem to have enough available recently. If that changes I ll look at stepping in to help, but I don t see that happening (it got close this year but several people had stood up before I got around to offering). I continue to be involved in Front Desk, having various conversations throughout the year with the rest of the team and occasionally approving some of the checks for new applicants. Towards the end of the year I got involved with the Debian Games Team, largely because I m keen to try and get my Kodi working with libretro based emulators - I d really like to be able to play old style games from the same interface as I can engage with locally stored movies, music and TV. It turns out there are a lot of moving pieces to make that happen, some missing from Debian and others in need of some TLC. I updated retroarch to current upstream (1.13.0+dfsg-1 + 1.13.0+dfsg-2) but while I was doing so upstream did another release. I plan on uploading 1.14.0 once 1.13.0 has migrated to testing. It turned out I also needed to update libretro-core-info (1.13.0-1) and retroarch-assets (1.7.6+git20221024+dfsg-1). In terms of actual emulators I pulled in new versions for genesisplusgx (1.7.4+git20221128-1) and libretro-bsnes-mercury (094+git20220807-1). On the Kodi side I haven t uploaded anything yet. I ve filed an ITP for rcheevos, which is a dependency for game.libretro and I have a fledgling package for game.libretro that I finally got working today. I m not sure if I can get it cleaned up enough in time to make the bookworm release, but I m hoping that at least the libretro piece is in a bit better shape now (though I m aware there are more emulator cores that could do with being updated).

Linux This year was a quiet year for personal Linux contributions. I submitted a minor fix for the qca8081 PHY with speeds lower than 2.5Gb/s that caused me issues on my RB5009.

Personal projects 2022 finally saw a minor releases of onak, 0.6.2, which resulted in a corresponding Debian upload (0.6.2-1). It has a couple of bug fixes but nothing major.. As I said last year it s not dead, just resting, but Sequoia PGP is probably where you should be looking for a modern OpenPGP implementation. I added some basic Debian packaging to mqtt-arp - I didn t bother uploading it as it s a fairly niche package, but I m using it locally.

12 December 2022

Jonathan McDowell: Setting up FreshRSS in a subdirectory

Ever since the demise of Google Reader I have been looking for a suitable replacement RSS reader. In the past I used to use Liferea but that was when I used a single desktop machine; these days I want to be able to read on my phone and multiple machines. I moved to Feedly and it s been mostly ok, but I m hitting the limit of feeds available in the free tier, and $72/year is a bit more than I can justify to myself. Especially when I have machines already available to me where I could self host something. The problem, of course, is what to host. It seems the best options are all written in PHP, so I had to get over my adverse knee-jerk reaction to that. I ended up on FreshRSS but if it hadn t worked out I d have tried TinyTinyRSS. Of course I m hosting on Debian, and the machine I chose to use was already running nginx and PostgreSQL. So I needed to install PHP:
$ sudo apt install php7.4-fpm php-curl php-gmp php-intl php-mbstring \
	php-pgsql php-xml php-zip
I put my FreshRSS install in /srv/freshrss so I grabbed the 1.20.2 release from GitHub (actually 1.20.1 at the time, but I ve upgraded to the latest since) and untared it in there. I gave www-data access to the data directory (sudo chown -R www-data /srv/freshrss/data) (yes, yes, I could have created a new user specifically for FreshRSS, but I ve chosen not to for now). There s no actual need to configure things up on the filesystem, you can do the initial setup from the web interface. Which is where the trouble came. I ve been an Apache user since 1998 and as a result it s what I know and what I go to. nginx is new to me. And I wanted my FreshRSS instance to live in a subdirectory of an existing TLS-enabled host, rather than have it s own hostname. Now, at least FreshRSS copes with this (unlike far too many other projects), you just have to configure your webserver correctly. Which took me more experimentation than I d like, but I ve ended up with the following snippet:
    # PHP files handling
    location ~ ^/freshrss/.+?\.php(/.*)?$  
        root /srv/freshrss/p;
        fastcgi_pass unix:/run/php/php-fpm.sock;
        fastcgi_split_path_info ^/freshrss(/.+\.php)(/.*)?$;
        set $path_info $fastcgi_path_info;
        fastcgi_param PATH_INFO $path_info;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
     
    location ~ ^/freshrss(/.*)?$  
        root /srv/freshrss/p;
        try_files $1 /freshrss$1/index.php$is_args$args;
     
Other than the addition of the freshrss prefix this ends up differing slightly from the FreshRSS webserver configuration example. I ended up having to make the path info on the fastcgi_split_path_info optional, and my try_files in the bare directory location directive needed $is_args$args added or I just ended up in a redirect loop because the session parameters didn t get passed through. I m sure there s a better way to do it, but I did a bunch of searching and this is how I ended up making it work. Before firing up the web configuration I created a suitable database:
$ sudo -Hu postgres psql
psql (13.8 (Debian 13.8-0+deb11u1))
Type "help" for help.
postgres=# create database freshrss;
CREATE DATABASE
postgres=# create user freshrss with encrypted password 'hunter2';
CREATE ROLE
postgres=# grant all privileges on database freshrss to freshrss;
GRANT
postgres=# \q
I ran through the local configuration, creating myself a user and adding some feeds, then created a cronjob to fetch updates hourly and keep a log:
# mkdir /var/log/freshrss
# chown :www-data /var/log/freshrss
# chmod 775 /var/log/freshrss
# cat > /etc/cron.d/freshrss-refresh <EOF
33 * * * * www-data /srv/freshrss/app/actualize_script.php > /var/log/freshrss/update-$(date --iso-8601=minutes).log 2>&1
EOF
Experiences so far? Reasonably happy. The interface seems snappy enough, and works well both on mobile and desktop. I m only running a single user instance at present, but am considering opening it up to some other folk and will see how that scales. And it clearly indicated a number of my feeds that were broken, so I ve cleaned some up that are still around and deleted the missing ones. Now I just need to figure out what else I should be subscribed to that I ve been putting off due to the Feedly limit!

29 November 2022

Jonathan McDowell: onak 0.6.2 released

Over the weekend I released a new version of onak, my OpenPGP compatible keyserver. At 2 years since the last release that means I ve at least managed to speed up a bit, but it s fair to say its development isn t a high priority for me at present. This release is largely driven by a collection of minor fixes that have built up, and the knowledge that a Debian freeze is coming in the new year. The fixes largely revolve around the signature verification that was introduced in 0.6.0, which makes it a bit safer to run a keyserver by only accepting key material that can be validated. All of the major items I wanted to work on post 0.6.0 remain outstanding. For the next release I d like to get some basic Stateless OpenPGP Command Line Interface support integrated. That would then allow onak to be tested with the OpenPGP interoperability test suite, which has recently added support for verification only OpenPGP implementations. I realise most people like to dismiss OpenPGP, and the tooling has been fairly dreadful for as long as I ve been using it, but I do think it fills a space that no competing system has bothered to try and replicate. And that s the web of trust, which helps provide some ability to verify keys without relying on (but also without preventing) a central authority to do so. Anyway. Available locally or via GitHub.
0.6.2 - 27th November 2022
  • Don t take creation time from unhashed subpackets
  • Fix ECDSA/SHA1 signature check
  • Fix handling of other signature requirement
  • Fix deletion of keys with PostgreSQL backend
  • Add support for verifying v3 signature packets

28 April 2022

Jonathan McDowell: Resizing consoles automatically

I have 2 very useful shell scripts related to resizing consoles. The first is imaginatively called resize and just configures the terminal to be the requested size, neatly resizing an xterm or gnome-terminal:
#!/bin/sh
# resize <rows> <columns>
/bin/echo -e '\033[8;'$1';'$2't'
The other is a bit more complicated and useful when connecting to a host via a serial console, or when driving a qemu VM with -display none -nographic and all output coming over a serial console on stdio. It figures out the size of the terminal it s running in and correctly sets the local settings to match so you can take full advantage of a larger terminal than the default 80x24:
#!/bin/bash
echo -ne '\e[s\e[5000;5000H'
IFS='[;' read -p $'\e[6n' -d R -a pos -rs
echo -ne '\e[u'
# cols / rows
echo "Size: $ pos[2]  x $ pos[1] "
stty cols "$ pos[2] " rows "$ pos[1] "
export TERM=xterm-256color
Generally I source this with . fix-term or the TERM export doesn t get applied. Both of these exist in various places around the net (and there s a resize binary shipped along with xterm) but I always forget the exact terms to find it again when I need it. So this post is mostly intended to serve as future reference next time I don t have them handy.

3 March 2022

Jonathan McDowell: Neat uses for a backlit keyboard

I bought myself a new keyboard last November, a Logitech G213. True keyboard fans will tell me it s not a real mechanical keyboard, but it was a lot cheaper and met my requirements of having some backlighting and a few media keys (really all I use are the volume control keys). Oh, and being a proper UK layout. While the G213 isn t fully independent RGB per key it does have a set of zones that can be controlled. Also this has been reverse engineered, so there are tools to do this under Linux. All I really wanted was some basic backlighting to make things a bit nicer in the evenings, but with the ability to control colour I felt I should put it to good use. As previously mentioned I have a personal desktop / work laptop setup combined with a UGREEN USB 3.0 Sharing Switch Box, so the keyboard is shared between both machines. So I configured up both machines to set the keyboard colour when the USB device is plugged in, and told them to use different colours. Instant visual indication of which machine I m currently typing on! Running the script on USB detection is easy, a file in /etc/udev/rules.d/. I called it 99-keyboard-colour.rules:
# Change the keyboard colour when we see it
ACTION=="add", SUBSYSTEM=="usb", ATTR idVendor =="046d", ATTR idProduct =="c336", \
        RUN+="/usr/local/sbin/g213-set"
g213-set is a simple bit of Python:
#!/usr/bin/python3
import sys
found = False
devnum = 0
while not found:
    try:
        with open("/sys/class/hidraw/hidraw" + str(devnum) + "/device/uevent") as f:
            for line in f:
                line = line.rstrip()
                if line == 'HID_NAME=Logitech Gaming Keyboard G213':
                    found = True
    except:
        break
    if not found:
        devnum += 1
if not found:
    print("Could not find keyboard device")
    sys.exit(1)
eventfile = "/dev/hidraw" + str(devnum)
#                                   z       r     g     b
command = [ 0x11, 0xff, 0x0c, 0x3a, 0, 1, 0xff, 0xff, 0x00, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
with open(eventfile, "wb") as f:
    f.write(bytes(command))
I did wonder about trying to make it turn red when I m in a root terminal, but that gets a bit more complicated (I m guessing I need to hook into GNOME Terminal some how?) and this simple hack gives me a significant win anyway.

23 February 2022

Jonathan McDowell: Upgrading my home internet; a story of yak shaving

RB5009 This has ended up longer than I expected. I ll write up posts about some of the individual steps with some more details at some point, but this is an overview of the yak shaving I engaged in. The TL;DR is:

The desire for a faster connection When I migrated my home connection to FTTP I kept the same 80M/20M profile I d had on FTTC. I didn t have a pressing need for faster, and I saved money because I was no longer paying for the phone line portion. I wanted more, but at the time I think the only option was for a 160M/30M profile instead and I didn t need it and it wasn t enough better to convince me. Time passed and BT rolled out their GigE (really 900M) download option. And again, I didn t need it, but I wanted it. My provider, Aquiss, initially didn t offer this (I think they had up to 330M download options available by this point). So I stayed on 80M/20M. And the only time I really wanted it to be faster was when pushing off-site backups to rsync.net. Of course, we ve had the pandemic, and that s involved 2 adults working from home with plenty of video calls throughout the day. The 80M/20M connection has proved rock solid for this, so again, I didn t feel an upgrade was justified. We got a 4K capable TV last year and while the bandwidth usage for 4K streaming is noticeably higher, again the connection can handle it no problem. At some point last year I noticed Aquiss had added speed options all the way to 900M down. At the end of the year I accepted a new role, which is fully remote, so I had a bit of an acceptance about the fact that I wasn t going back into an office any time soon. The combination (and the desire for the increased upload speed) finally allowed me to justify the upgrade to myself.

Testing the current setup for bottlenecks The first thing to do was see whether my internal network could cope with an upgrade. I m mostly running Cat6 GigE so I wasn t worried about that side of things. However I m using an RB3011 as my core router, and while it has some coprocessors for routing acceleration they re not supported under mainline Linux (and unlikely to be any time soon). So I had to benchmark what it was capable of routing. I run a handful of VLANs within my home network, with stateful firewalling between them, so I felt that would be a good approximation of the maximum speed to the outside world I might be able to get if I had the external connection upgraded. I went for the easy approach and fired up iPerf3 on 2 hosts, both connected via ethernet but on separate networks, so routed through the RB3011. That resulted in slightly more than a 300Mb/s throughput. Ok. I confirmed that I could get 900Mb/s+ on 2 hosts both on the same network, just to be sure there wasn t some other issue I was missing. Nope, so unsurprisingly the router was the bottleneck. So. To upgrade my internet speed I need to upgrade my router. I could just buy something off the shelf, but I like being able to run Debian (or OpenWRT) on the router rather than some horrible vendor firmware. Lucky MikroTik launched the RB5009 towards the end of last year. RouterOS is probably more than capable, but what really interested me was the fact it s an ARM64 platform based on an Armada 7040, which is pretty well supported in mainline kernels already. There s a 10G connection from the internal switch to the CPU, as well as a 2.5Gb/s ethernet port and a 10G SFP+ cage. All good stuff. I ordered one just before the New Year. Thankfully the OpenWRT folk had done all of the hard work on getting a mainline kernel booting on the device; Sergey Sergeev and Robert Marko in particular fighting RouterBoot and producing a suitable device tree file to get everything up and running. I ended up soldering a serial console connection up to aid debugging, and lightly patching Rob s u-boot to fix the incorrect RAM size reported by RouterBoot. A few kernel tweaks were necessary to make the networking entirely happy and at that point it was time to think about actually doing a replacement.

Upgrading to Debian 11 (bullseye) My RB3011 is currently running Debian 10 (buster); an upgrade has been on my todo list, but with the impending replacement I decided I d hold off and create a new Debian 11 (bullseye) image for the RB5009. Additionally, I don t actually run off the internal NAND in the RB3011; I have a USB flash drive for the rootfs and just the kernel booting off internal NAND. Originally this was for ease of testing, then a combination of needing to figure out a good read-only root solution and a small enough image to fit in the 120M available. For the upgrade I decided to finally look at these pieces. I ve ended up with a script that will build me a squashfs image, and the initial rootfs takes care of mounting this and then a tmpfs as an overlay fs. That means I can easily see what pieces are being written to. The RB5009 has a total of 1G NAND so I m not as space constrained, but the squashfs ends up under 50M. I ve added some additional pieces to allow me to pre-populate the overlay fs with updates rather than always needing to rebuild the squashfs image. With that done I decided to try it out on the RB3011; I tweaked the build script to be able to build for armhf (the RB3011) or arm64 (the RB5009) and to deal with some slight differences in configuration between the two (e.g. interface naming). The idea here was to ensure I d got all the appropriate configuration sorted for the RB5009, in the known-good existing environment. Everything is still on a USB stick at this stage and the new device has an armhf busybox root meaning it can be used on either device, and the init script detects the architecture to select the appropriate squashfs to mount.

A problem with ESP8266 home automation devices Everything seemed to work fine - a few niggles with the watchdog, which is overly sensitive on the RB3011, but I got those sorted (and the build script updated) and the device came up and successfully did the PPPoE dance to bring up external connectivity. And then I noticed that my home automation devices were having problems connecting to the mosquitto MQTT server. It turned out it was only the ESP8266 based devices that were failing, and examining the serial debug output on one of my test devices revealed it was hitting an out of memory issue (displaying E:M 280) when establishing the TLS MQTT connection. I rolled back to the Debian 10 image and set about creating a test environment to look at the ESP8266 issues. My first action was to try and reduce my RAM footprint to try and ensure there was enough spare to establish the connection. I moved a few functions that were still sitting in IRAM into flash. I cleaned up a couple of buffers that are on the stack to be more correctly sized. I tried my new image, and I didn t get the memory issue. Instead I progressed a bit further and got a watchdog reset. Doh! It was obviously something related to the TLS connection, but I couldn t easily see what the difference was; the same x509 cert was in use, it looked like the initial handshake was the same (and trying with openssl s_client looked pretty similar too). I set about instrumenting the ancient Mbed TLS used in the Espressif SDK and discovered that whatever had changed between buster + bullseye meant the EPS8266 was now trying a TLS-DHE-RSA-WITH-AES-256-CBC-SHA256 handshake instead of a TLS-RSA-WITH-AES-256-CBC-SHA256 handshake and that was causing enough extra CPU usage that it couldn t complete in time and the watchdog kicked in. So I commented out MBEDTLS_KEY_EXCHANGE_DHE_RSA_ENABLED in the config_esp.h for mbedtls and rebuilt things. Hacky, but I ll go back to trying to improve this generally at some point.

A detour into interrupt load Now, my testing of the RB3011 image is generally done at weekends, when I have enough time to tear down and rebuild the connection rather than doing it in the evening and having limited time to get things working again in time for work in the morning. So at the point I had an image ready to go I pulled the trigger on the line upgrade. I went with the 500M/75M option rather than the full 900M - I suspect I d have difficulty actually getting that most of the time and 75M of upload bandwidth seems fairly substantial for now. It only took a couple of days from the order to the point the line was regraded (which involved no real downtime - just a reconnection in the night). Of course this happened just after the weekend I d discovered the ESP8266 issue. collectd CPU usage for RB3011 This provided an opportunity to see just what the RB3011 could actually manage. In the configuration I had it turned out to be not much more than the 80Mb/s speeds I had previously seen. The upload jumped from a solid 20Mb/s to 75Mb/s, so I knew the regrade had actually happened. Looking at CPU utilisation clearly showed the problem; softirqs were using almost 100% of a CPU core. Now, the way the hardware is setup on the RB3011 is that there are two separate 5 port switches, each connected back to the CPU via a separate GigE interface. For various reasons I had everything on a single switch, which meant that all traffic was boomeranging in and out of the same CPU interface. The IPQ8064 has dual cores, so I thought I d try moving the external connection to the other switch. That puts it on its own GigE CPU interface, which then allows binding the interrupts to a different CPU core. That helps; throughput to the outside world hits 140Mb/s+. Still a long way from the expected max, but proof we just need more grunt.

Success collectd CPU usage for RB5009 Which brings us to this past weekend, when, having worked out all the other bits, I tried the squashfs root image again on the RB3011. Success! The home automation bits connected to it, the link to the outside world came up, everything seemed happy. So I double checked my bootloader bits on the RB5009, brought it down to the comms room and plugged it in instead. And, modulo my failing to update the nftables config to allow it to do forwarding, it all came up ok. Some testing with iperf3 internally got a nice 912Mb/s sustained between subnets, and some less scientific testing with wget + speedtest-cli saw speeds of over 460Mb/s to the outside world. Time from ordering the router until it was in service? Just under 8 weeks

6 February 2022

Jonathan McDowell: Free Software Activities for 2021

About a month later than I probably should have posted it, here s a recap of my Free Software activities in 2021. For previous years see 2019 + 2020. Again, this year had fewer contributions than I d like thanks to continuing fatigue about the state of the world, and trying to work on separation between work and leisure while working from home. I ve made some effort to improve that balance but it s still a work in progress.

Conferences No surprise, I didn t attend any in-person conferences in 2021. I find virtual conferences don t do a lot for me (a combination of my not carving time out for them in the same way, because not being at the conference means other things will inevitably intrude, and the lack of the social side) but I did get to attend a few of the DebConf21 talks, which was nice. I m hoping to make it to DebConf22 this year in person.

Debian Most of my contributions to Free software continue to happen within Debian. As part of the Data Protection Team I responded to various inbound queries to that team. Some of this involved chasing up other project teams who had been slow to respond - folks, if you re running a service that stores personal data about people then you need to be responsive to requests about it. Some of this was dealing with what look like automated scraping tools which send no information about the person making the request, and in all the cases we ve seen so far there s been no indication of any data about that person on any systems we have access to. Further team time was wasted dealing with the Princeton-Radboud Study on Privacy Law Implementation (though Matthew did the majority of the work on this). The Debian Keyring was possibly my largest single point of contribution. We re in a roughly 3 month rotation of who handles the keyring updates, and I handled 2021.03.24, 2021.04.09, 2021.06.25, 2021.09.25 + 2021.12.24 For Debian New Members I m mostly inactive as an application manager - we generally seem to have enough available recently. If that changes I ll look at stepping in to help, but I don t see that happening. I continue to be involved in Front Desk, having various conversations throughout the year with the rest of the team, but there s no doubt Mattia and Pierre-Elliott are the real doers at present. I did take part in an NM Committee appeals process. In terms of package uploads I continued to work on gcc-xtensa-lx106, largely doing uploads to deal with updates to the GCC version or packaging (8 + 9). sigrok had a few minor updates, libsigkrok 0.5.2-3, pulseview 0.4.2-3 as well as a new upstream release of sigrok CLI 0.7.2-1. There was a last minute pre-release upload of libserialport 0.1.1-4 thanks to a kernel change in v5.10.37 which removed termiox support. Despite still not writing any VHDL these days I continue to keep an eye on ghdl, because I found it a useful tool in the past. Last year that was just a build fix for LLVM 11.1.0 - 1.0.0+dfsg+5. Andreas Bombe has largely taken over more proactive maintenance, which is nice to see. I uploaded OpenOCD 0.11.0~rc1-2, cleaning up some packaging / dependency issues. This was followed by 0.11.0~rc2-1 as a newer release candidate. Sadly 0.11.0 did not make it in time for bullseye, but rc2 was fairly close and I uploaded 0.11.0-1 once bullseye was released. Finally I did a drive-by upload for garmin-forerunner-tools 0.10repacked-12, cleaning up some packaging issues and uploading it to salsa. My Forerunner 305 has died (after 11 years of sterling service) and the Forerunner 45 I ve replaced it with uses a different set of tools, so I decided it didn t make sense to pick up longer term ownership of the package.

Linux My Linux contributions continued to revolve around pushing MikroTik RB3011 support upstream. There was a minor code change to Set FIFO sizes for ipq806x (which fixed up the allowed MTU for the internal switch + VLANs). The rest was DTS related - adding ADM DMA + NAND definitions now that the ADM driver was merged, adding tsens details, adding USB port info and adding the L2CC and RPM details for IPQ8064. Finally I was able to update the RB3011 DTS to enable NAND + USB. With all those in I m down to 4 local patches against a mainline kernel, all of which are hacks that aren t suitable for submission upstream. 2 are for patching in details of the root device and ethernet MAC addresses, one is dealing with the fact the IPQ8064 has some reserved memory that doesn t play well with AUTO_ZRELADDR (there keeps being efforts to add some support for this via devicetree, but unfortunately it gets shot down every time), and the final one is a hack to turn off the LCD backlight by treating it as an LED (actually supporting the LCD properly is on my TODO list).

Personal projects 2021 didn t see any releases of onak. It s not dead, just resting, but Sequoia PGP is probably where you should be looking for a modern OpenPGP implementation. I continued work on my Desk Viking project, which is an STM32F103 based debug tool inspired by the Bus Pirate. The main addtion was some CCLib support (forking it in the process to move to Python 3 and add some speed ups) to allow me to program my Zigbee dongles, but I also added some 1-Wire search logic and some support for Linux emulation mode with VCD output to allow for a faster development cycle. I really want to try and get OpenOCD JTAG mode supported at some point, and have vague plans for an STM32F4 based version that have suffered from a combination of a silicon shortage and a lack of time. That wraps up 2021. I d like to say I m hoping to make more Free Software contributions this year, but I don t have a concrete plan yet for how that might happen, so I ll have to wait and see.

4 January 2022

Jonathan McDowell: Upgrading from a CC2531 to a CC2538 Zigbee coordinator

Previously I setup a CC2531 as a Zigbee coordinator for my home automation. This has turned out to be a good move, with the 4 gang wireless switch being particularly useful. However the range of the CC2531 is fairly poor; it has a simple PCB antenna. It s also a very basic device. I set about trying to improve the range and scalability and settled upon a CC2538 + CC2592 device, which feature an MMCX antenna connector. This device also has the advantage that it s ARM based, which I m hopeful means I might be able to build some firmware myself using a standard GCC toolchain. For now I fetched the JetHome firmware from https://github.com/jethome-ru/zigbee-firmware/tree/master/ti/coordinator/cc2538_cc2592 (JH_2538_2592_ZNP_UART_20211222.hex) - while it s possible to do USB directly with the CC2538 my board doesn t have those bits so going the external USB UART route is easier. The device had some existing firmware on it, so I needed to erase this to force a drop into the boot loader. That means soldering up the JTAG pins and hooking it up to my Bus Pirate for OpenOCD goodness.
OpenOCD config
source [find interface/buspirate.cfg]
buspirate_port /dev/ttyUSB1
buspirate_mode normal
buspirate_vreg 1
buspirate_pullup 0
transport select jtag
source [find target/cc2538.cfg]
Steps to erase
$ telnet localhost 4444
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger
> mww 0x400D300C 0x7F800
> mww 0x400D3008 0x0205
> shutdown
shutdown command invoked
Connection closed by foreign host.
At that point I can switch to the UART connection (on PA0 + PA1) and flash using cc2538-bsl:
$ git clone https://github.com/JelmerT/cc2538-bsl.git
$ cc2538-bsl/cc2538-bsl.py -p /dev/ttyUSB1 -e -w -v ~/JH_2538_2592_ZNP_UART_20211222.hex
Opening port /dev/ttyUSB1, baud 500000
Reading data from /home/noodles/JH_2538_2592_ZNP_UART_20211222.hex
Firmware file: Intel Hex
Connecting to target...
CC2538 PG2.0: 512KB Flash, 32KB SRAM, CCFG at 0x0027FFD4
Primary IEEE Address: 00:12:4B:00:22:22:22:22
    Performing mass erase
Erasing 524288 bytes starting at address 0x00200000
    Erase done
Writing 524256 bytes starting at address 0x00200000
Write 232 bytes at 0x0027FEF88
    Write done
Verifying by comparing CRC32 calculations.
    Verified (match: 0x74f2b0a1)
I then wanted to migrate from the old device to the new without having to repair everything. So I shut down Home Assistant and backed up the CC2531 network information using zigpy-znp (which is already installed for Home Assistant):
python3 -m zigpy_znp.tools.network_backup /dev/zigbee > cc2531-network.json
I copied the backup to cc2538-network.json and modified the coordinator_ieee to be the new device s MAC address (rather than end up with 2 devices claiming the same MAC if/when I reuse the CC2531) and did:
python3 -m zigpy_znp.tools.network_restore --input cc2538-network.json /dev/ttyUSB1
The old CC2531 needed unplugged first, otherwise I got an RuntimeError: Network formation refused, RF environment is likely too noisy. Temporarily unscrew the antenna or shield the coordinator with metal until a network is formed. error. After that I updated my udev rules to map the CC2538 to /dev/zigbee and restarted Home Assistant. To my surprise it came up and detected the existing devices without any extra effort on my part. However that resulted in 2 coordinators being shown in the visualisation, with the old one turning up as unk_manufacturer. Fixing that involved editing /etc/homeassistant/.storage/core.device_registry and removing the entry which had the old MAC address, removing the device entry in /etc/homeassistant/.storage/zha.storage for the old MAC and then finally firing up sqlite to modify the Zigbee database:
$ sqlite3 /etc/homeassistant/zigbee.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> DELETE FROM devices_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
sqlite> DELETE FROM endpoints_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
sqlite> DELETE FROM in_clusters_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
sqlite> DELETE FROM neighbors_v6 WHERE ieee = '00:12:4b:00:11:11:11:11' OR device_ieee = '00:12:4b:00:11:11:11:11';
sqlite> DELETE FROM node_descriptors_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
sqlite> DELETE FROM out_clusters_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
sqlite> .quit
So far it all seems a bit happier than with the CC2531; I ve been able to pair a light bulb that was previously detected but would not integrate, which suggests the range is improved. (This post another in the set of things I should write down so I can just grep my own website when I forget what I did to do foo .)

2 December 2021

Jonathan McDowell: Building a desktop to improve my work/life balance

ASRock DeskMini X300 It s been over 20 months since the first COVID lockdown kicked in here in Northern Ireland and I started working from home. Even when the strict lockdown was lifted the advice here has continued to be If you can work from home you should work from home . I ve been into the office here and there (for new starts given you need to hand over a laptop and sort out some login details it s generally easier to do so in person, and I ve had a couple of whiteboard sessions that needed the high bandwidth face to face communication), but day to day is all from home. Early on I commented that work had taken over my study. This has largely continued to be true. I set my work laptop on the stand on a Monday morning and it sits there until Friday evening, when it gets switched for the personal laptop. I have a lovely LG 34UM88 21:9 Ultrawide monitor, and my laptops are small and light so I much prefer to use them docked. Also my general working pattern is to have a lot of external connections up and running (build machine, test devices, log host) which means a suspend/resume cycle disrupts things. So I like to minimise moving things about. I spent a little bit of time trying to find a dual laptop stand so I could have both machines setup and switch between them easily, but I didn t find anything that didn t seem to be geared up for DJs with a mixer + laptop combo taking up quite a bit of desk space rather than stacking laptops vertically. Eventually I realised that the right move was probably a desktop machine. Now, I haven t had a desktop machine since before I moved to the US, realising at the time that having everything on my laptop was much more convenient. I decided I didn t want something too big and noisy. Cheap GPUs seem hard to get hold of these days - I m not a gamer so all I need is something that can drive a ~ 4K monitor reliably enough. Looking around the AMD Ryzen 7 5700G seemed to be a decent CPU with one of the better integrated GPUs. I spent some time looking for a reasonable Mini-ITX case + motherboard and then I happened upon the ASRock DeskMini X300. This turns out to be perfect; I ve no need for a PCIe slot or anything more than an m.2 SSD. I paired it with a Noctua NH-L9a-AM4 heatsink + fan (same as I use in the house server), 32GB DDR4 and a 1TB WD SN550 NVMe SSD. Total cost just under 650 inc VAT + delivery (and that s a story for another post). A desktop solves the problem of fitting both machines on the desk at once, but there s still the question of smoothly switching between them. I read Evgeni Golov s article on a simple KVM switch for 30. My monitor has multiple inputs, so that s sorted. I did have a cheap USB2 switch (all I need for the keyboard/trackball) but it turned out to be pretty unreliable at the host detecting the USB change. I bought a UGREEN USB 3.0 Sharing Switch Box instead and it s turned out to be pretty reliable. The problem is that the LG 32UM88 turns out to have a poor DDC implementation, so while I can flip the keyboard easily with the UGREEN box I also have to manually select the monitor input. Which is a bit annoying, but not terrible. The important question is whether this has helped. I built all this at the end of October, so I ve had a month to play with it. Turns out I should have done it at some point last year. At the end of the day instead of either sitting at work for a bit longer, or completely avoiding the study, I m able to lock the work machine and flick to my personal setup. Even sitting in the same seat that disconnect , and the knowledge I won t see work Slack messages or emails come in and feeling I should respond, really helps. It also means I have access to my personal setup during the week without incurring a hit at the start of the working day when I have to set things up again. So it s much easier to just dip in to some personal tech stuff in the evening than it was previously. Also from the point of view I don t need to setup the personal config, I can pick up where I left off. All of which is really nice. It s also got me thinking about other minor improvements I should make to my home working environment to try and improve things. One obvious thing now the winter is here again is to improve my lighting; I have a good overhead LED panel but it s terribly positioned for video calls, being just behind me. So I think I m looking some sort of strip light I can have behind the large monitor to give a decent degree of backlight (possibly bouncing off the white wall). Lots of cheap options I m not convinced about, and I ve had a few ridiculously priced options from photographer friends; suggestions welcome.

28 September 2021

Jonathan McDowell: Adding Zigbee to my home automation

SonOff Zigbee Door Sensor My home automation setup has been fairly static recently; it does what we need and generally works fine. One area I think could be better is controlling it; we have access Home Assistant on our phones, and the Alexa downstairs can control things, but there are no smart assistants upstairs and sometimes it would be nice to just push a button to turn on the light rather than having to get my phone out. Thanks to the fact the UK generally doesn t have neutral wire in wall switches that means looking at something battery powered. Which means wifi based devices are a poor choice, and it s necessary to look at something lower power like Zigbee or Z-Wave. Zigbee seems like the better choice; it s a more open standard and there are generally more devices easily available from what I ve seen (e.g. Philips Hue and IKEA TR DFRI). So I bought a couple of Xiaomi Mi Smart Home Wireless Switches, and a CC2530 module and then ignored it for the best part of a year. Finally I got around to flashing the Z-Stack firmware that Koen Kanters kindly provides. (Insert rant about hardware manufacturers that require pay-for tool chains. The CC2530 is even worse because it s 8051 based, so SDCC should be able to compile for it, but the TI Zigbee libraries are only available in a format suitable for IAR s embedded workbench.) Flashing the CC2530 is a bit of faff. I ended up using the CCLib fork by Stephan Hadinger which supports the ESP8266. The nice thing about the CC2530 module is it has 2.54mm pitch pins so nice and easy to jumper up. It then needs a USB/serial dongle to connect it up to a suitable machine, where I ran Zigbee2MQTT. This scares me a bit, because it s a bunch of node.js pulling in a chunk of stuff off npm. On the flip side, it Just Works and I was able to pair the Xiaomi button with the device and see MQTT messages that I could then use with Home Assistant. So of course I tore down that setup and went and ordered a CC2531 (the variant with USB as part of the chip). The idea here was my test setup was upstairs with my laptop, and I wanted something hooked up in a more permanent fashion. Once the CC2531 arrived I got distracted writing support for the Desk Viking to support CCLib (and modified it a bit for Python3 and some speed ups). I flashed the dongle up with the Z-Stack Home 1.2 (default) firmware, and plugged it into the house server. At this point I more closely investigated what Home Assistant had to offer in terms of Zigbee integration. It turns out the ZHA integration has support for the ZNP protocol that the TI devices speak (I m reasonably sure it didn t when I first looked some time ago), so that seemed like a better option than adding the MQTT layer in the middle. I hit some complexity passing the dongle (which turns up as /dev/ttyACM0) through to the Home Assistant container. First I needed an override file in /etc/systemd/nspawn/hass.nspawn:
[Files]
Bind=/dev/ttyACM0:/dev/zigbee
[Network]
VirtualEthernet=true
(I m not clear why the VirtualEthernet needed to exist; without it networking broke entirely but I couldn t see why it worked with no override file.) A udev rule on the host to change the ownership of the device file so the root user and dialout group in the container could see it was also necessary, so into /etc/udev/rules.d/70-persistent-serial.rules went:
# Zigbee for HASS
SUBSYSTEM=="tty", ATTRS idVendor =="0451", ATTRS idProduct =="16a8", SYMLINK+="zigbee", \
	MODE="660", OWNER="1321926676", GROUP="1321926676"
In the container itself I had to switch PrivateDevices=true to PrivateDevices=false in the home-assistant.service file (which took me a while to figure out; yay for locking things down and then needing to use those locked down things). Finally I added the hass user to the dialout group. At that point I was able to go and add the integration with Home Assistant, and add the button as a new device. Excellent. I did find I needed a newer version of Home Assistant to get support for the button, however. I was still on 2021.1.5 due to upstream dropping support for Python 3.7 and not being prepared to upgrade to Debian 11 until it was actually released, so the version of zha-quirks didn t have the correct info. Upgrading to Home Assistant 2021.8.7 sorted that out. There was another slight problem. Range. Really I want to use the button upstairs. The server is downstairs, and most of my internal walls are brick. The solution turned out to be a TR DFRI socket, which replaced the existing ESP8266 wifi socket controlling the stair lights. That was close enough to the server to have a decent signal, and it acts as a Zigbee router so provides a strong enough signal for devices upstairs. The normal approach seems to be to have a lot of Zigbee light bulbs, but I have mostly kept overhead lights as uncontrolled - we don t use them day to day and it provides a nice fallback if the home automation has issues. Of course installing Zigbee for a single button would seem to be a bit pointless. So I ordered up a Sonoff door sensor to put on the front door (much smaller than expected - those white boxes on the door are it in the picture above). And I have a 4 gang wireless switch ordered to go on the landing wall upstairs. Now I ve got a Zigbee setup there are a few more things I m thinking of adding, where wifi isn t an option due to the need for battery operation (monitoring the external gas meter springs to mind). The CC2530 probably isn t suitable for my needs, as I ll need to write some custom code to handle the bits I want, but there do seem to be some ARM based devices which might well prove suitable

3 June 2021

Jonathan McDowell: Digging into Kubernetes containers

Having build a single node Kubernetes cluster and had a poke at what it s doing in terms of networking the next thing I want to do is figure out what it s doing in terms of containers. You might argue this should have come before networking, but to me the networking piece is more non-standard than the container piece, so I wanted to understand that first. Let s start with a process listing on the host. ps faxno user,stat,cmd There are a number of processes from the host kernel we don t care about:
kernel processes
    USER STAT CMD
       0 S    [kthreadd]
       0 I<    \_ [rcu_gp]
       0 I<    \_ [rcu_par_gp]
       0 I<    \_ [kworker/0:0H-events_highpri]
       0 I<    \_ [mm_percpu_wq]
       0 S     \_ [rcu_tasks_rude_]
       0 S     \_ [rcu_tasks_trace]
       0 S     \_ [ksoftirqd/0]
       0 I     \_ [rcu_sched]
       0 S     \_ [migration/0]
       0 S     \_ [cpuhp/0]
       0 S     \_ [cpuhp/1]
       0 S     \_ [migration/1]
       0 S     \_ [ksoftirqd/1]
       0 I<    \_ [kworker/1:0H-kblockd]
       0 S     \_ [cpuhp/2]
       0 S     \_ [migration/2]
       0 S     \_ [ksoftirqd/2]
       0 I<    \_ [kworker/2:0H-events_highpri]
       0 S     \_ [cpuhp/3]
       0 S     \_ [migration/3]
       0 S     \_ [ksoftirqd/3]
       0 I<    \_ [kworker/3:0H-kblockd]
       0 S     \_ [kdevtmpfs]
       0 I<    \_ [netns]
       0 S     \_ [kauditd]
       0 S     \_ [khungtaskd]
       0 S     \_ [oom_reaper]
       0 I<    \_ [writeback]
       0 S     \_ [kcompactd0]
       0 SN    \_ [ksmd]
       0 SN    \_ [khugepaged]
       0 I<    \_ [kintegrityd]
       0 I<    \_ [kblockd]
       0 I<    \_ [blkcg_punt_bio]
       0 I<    \_ [edac-poller]
       0 I<    \_ [devfreq_wq]
       0 I<    \_ [kworker/0:1H-kblockd]
       0 S     \_ [kswapd0]
       0 I<    \_ [kthrotld]
       0 I<    \_ [acpi_thermal_pm]
       0 I<    \_ [ipv6_addrconf]
       0 I<    \_ [kstrp]
       0 I<    \_ [zswap-shrink]
       0 I<    \_ [kworker/u9:0-hci0]
       0 I<    \_ [kworker/2:1H-kblockd]
       0 I<    \_ [ata_sff]
       0 I<    \_ [sdhci]
       0 S     \_ [irq/39-mmc0]
       0 I<    \_ [sdhci]
       0 S     \_ [irq/42-mmc1]
       0 S     \_ [scsi_eh_0]
       0 I<    \_ [scsi_tmf_0]
       0 S     \_ [scsi_eh_1]
       0 I<    \_ [scsi_tmf_1]
       0 I<    \_ [kworker/1:1H-kblockd]
       0 I<    \_ [kworker/3:1H-kblockd]
       0 S     \_ [jbd2/sda5-8]
       0 I<    \_ [ext4-rsv-conver]
       0 S     \_ [watchdogd]
       0 S     \_ [scsi_eh_2]
       0 I<    \_ [scsi_tmf_2]
       0 S     \_ [usb-storage]
       0 I<    \_ [cfg80211]
       0 S     \_ [irq/130-mei_me]
       0 I<    \_ [cryptd]
       0 I<    \_ [uas]
       0 S     \_ [irq/131-iwlwifi]
       0 S     \_ [card0-crtc0]
       0 S     \_ [card0-crtc1]
       0 S     \_ [card0-crtc2]
       0 I<    \_ [kworker/u9:2-hci0]
       0 I     \_ [kworker/3:0-events]
       0 I     \_ [kworker/2:0-events]
       0 I     \_ [kworker/1:0-events_power_efficient]
       0 I     \_ [kworker/3:2-events]
       0 I     \_ [kworker/1:1]
       0 I     \_ [kworker/u8:1-events_unbound]
       0 I     \_ [kworker/0:2-events]
       0 I     \_ [kworker/2:2]
       0 I     \_ [kworker/u8:0-events_unbound]
       0 I     \_ [kworker/0:1-events]
       0 I     \_ [kworker/0:0-events]
There are various basic host processes, including my SSH connections, and Docker. I note it s using containerd. We also see kubelet, the Kubernetes node agent.
host processes
    USER STAT CMD
       0 Ss   /sbin/init
       0 Ss   /lib/systemd/systemd-journald
       0 Ss   /lib/systemd/systemd-udevd
     101 Ssl  /lib/systemd/systemd-timesyncd
       0 Ssl  /sbin/dhclient -4 -v -i -pf /run/dhclient.enx00e04c6851de.pid -lf /var/lib/dhcp/dhclient.enx00e04c6851de.leases -I -df /var/lib/dhcp/dhclient6.enx00e04c6851de.leases enx00e04c6851de
       0 Ss   /usr/sbin/cron -f
     104 Ss   /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
       0 Ssl  /usr/sbin/dockerd -H fd://
       0 Ssl  /usr/sbin/rsyslogd -n -iNONE
       0 Ss   /usr/sbin/smartd -n
       0 Ss   /lib/systemd/systemd-logind
       0 Ssl  /usr/bin/containerd
       0 Ss+  /sbin/agetty -o -p -- \u --noclear tty1 linux
       0 Ss   sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
       0 Ss    \_ sshd: root@pts/1
       0 Ss        \_ -bash
       0 R+            \_ ps faxno user,stat,cmd
       0 Ss    \_ sshd: noodles [priv]
    1000 S         \_ sshd: noodles@pts/0
    1000 Ss+           \_ -bash
       0 Ss   /lib/systemd/systemd --user
       0 S     \_ (sd-pam)
    1000 Ss   /lib/systemd/systemd --user
    1000 S     \_ (sd-pam)
       0 Ssl  /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.4.1
And that just leaves a bunch of container related processes:
container processes
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id fd95c597ff3171ff110b7bf440229e76c5108d5d93be75ffeab54869df734413 -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id c2ff2c50f0bc052feda2281741c4f37df7905e3b819294ec645148ae13c3fe1b -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 589c1545d9e0cdf8ea391745c54c8f4db49f5f437b1a2e448e7744b2c12f8856 -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6f417fd8a8c573a2b8f792af08cdcd7ce663457f0f7218c8d55afa3732e6ee94 -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id afa9798c9f663b21df8f38d9634469e6b4db0984124547cd472a7789c61ef752 -address /run/containerd/containerd.sock
       0 Ssl   \_ kube-scheduler --authentication-kubeconfig=/etc/kubernetes/scheduler.conf --authorization-kubeconfig=/etc/kubernetes/scheduler.conf --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true --port=0
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 4b3708b62f4d427690f5979848c59fce522dab6c62a9c53b806ffbaef3f88e62 -address /run/containerd/containerd.sock
       0 Ssl   \_ kube-controller-manager --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-name=kubernetes --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --port=0 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --use-service-account-credentials=true
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 89f35bf7a825eb97db7035d29aa475a3a1c8aaccda0860a46388a3a923cd10bc -address /run/containerd/containerd.sock
       0 Ssl   \_ kube-apiserver --advertise-address=192.168.53.147 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 2dabff6e4f59c96d931d95781d28314065b46d0e6f07f8c65dc52aa465f69456 -address /run/containerd/containerd.sock
       0 Ssl   \_ etcd --advertise-client-urls=https://192.168.53.147:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://192.168.53.147:2380 --initial-cluster=udon=https://192.168.53.147:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://192.168.53.147:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://192.168.53.147:2380 --name=udon --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 73fae81715b670255b66419a7959798b287be7bbb41e96f8b711fa529aa02f0d -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 26d92a720c560caaa5f8a0217bc98e486b1c032af6c7c5d75df508021d462878 -address /run/containerd/containerd.sock
       0 Ssl   \_ /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=udon
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 7104f65b5d92a56a2df93514ed0a78cfd1090ca47b6ce4e0badc43be6c6c538e -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 48d735f7f44e3944851563f03f32c60811f81409e7378641404035dffd8c1eb4 -address /run/containerd/containerd.sock
       0 Ssl   \_ /usr/bin/weave-npc
       0 S<        \_ /usr/sbin/ulogd -v
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 36b418e69ae7076fe5a44d16cef223d8908016474cb65910f2fd54cca470566b -address /run/containerd/containerd.sock
       0 Ss    \_ /bin/sh /home/weave/launch.sh
       0 Sl        \_ /home/weave/weaver --port=6783 --datapath=datapath --name=12:82:8f:ed:c7:bf --http-addr=127.0.0.1:6784 --metrics-addr=0.0.0.0:6782 --docker-api= --no-dns --db-prefix=/weavedb/weave-net --ipalloc-range=192.168.0.0/24 --nickname=udon --ipalloc-init consensus=0 --conn-limit=200 --expect-npc --no-masq-local
       0 Sl        \_ /home/weave/kube-utils -run-reclaim-daemon -node-name=udon -peer-name=12:82:8f:ed:c7:bf -log-level=debug
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 534c0a698478599277482d97a137fab8ef4d62db8a8a5cf011b4bead28246f70 -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 9ffd6b668ddfbf3c64c6783bc6f4f6cc9e92bfb16c83fb214c2cbb4044993bf0 -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 4a30785f91873a7e6a191e86928a789760a054e4fa6dcd7048a059b42cf19edf -address /run/containerd/containerd.sock
       0 Ssl   \_ /coredns -conf /etc/coredns/Corefile
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 649a507d45831aca1de5231b49afc8ff37d90add813e7ecd451d12eedd785b0c -address /run/containerd/containerd.sock
       0 Ssl   \_ /coredns -conf /etc/coredns/Corefile
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 62b369de8d8cece4d33ec9fda4d23a9718379a8df8b30173d68f20bff830fed2 -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 7cbb177bee18dbdeed21fb90e74378e2081436ad5bf116b36ad5077fe382df30 -address /run/containerd/containerd.sock
       0 Ss    \_ /bin/bash /usr/local/bin/run.sh
       0 S         \_ nginx: master process nginx -g daemon off;
   65534 S             \_ nginx: worker process
       0 Ss   /lib/systemd/systemd --user
       0 S     \_ (sd-pam)
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6669168db70db4e6c741e8a047942af06dd745fae4d594291d1d6e1077b05082 -address /run/containerd/containerd.sock
       0 Ss    \_ /pause
       0 Sl   /usr/bin/containerd-shim-runc-v2 -namespace moby -id d5fa78fa31f11a4c5fb9fd2e853a00f0e60e414a7bce2e0d8fcd1f6ab2b30074 -address /run/containerd/containerd.sock
     101 Ss    \_ /usr/bin/dumb-init -- /nginx-ingress-controller --publish-service=ingress-nginx/ingress-nginx-controller --election-id=ingress-controller-leader --ingress-class=nginx --configmap=ingress-nginx/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key
     101 Ssl       \_ /nginx-ingress-controller --publish-service=ingress-nginx/ingress-nginx-controller --election-id=ingress-controller-leader --ingress-class=nginx --configmap=ingress-nginx/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key
     101 S             \_ nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf
     101 Sl                \_ nginx: worker process
     101 Sl                \_ nginx: worker process
     101 Sl                \_ nginx: worker process
     101 Sl                \_ nginx: worker process
     101 S                 \_ nginx: cache manager process
There s a lot going on there. Some bits are obvious; we can see the nginx ingress controller, our echoserver (the other nginx process hanging off /usr/local/bin/run.sh), and some things that look related to weave. The rest appears to be Kubernete s related infrastructure. kube-scheduler, kube-controller-manager, kube-apiserver, kube-proxy all look like core Kubernetes bits. etcd is a distributed, reliable key-value store. coredns is a DNS server, with plugins for Kubernetes and etcd. What does Docker claim is happening?
docker ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED      STATUS      PORTS     NAMES
d5fa78fa31f1   k8s.gcr.io/ingress-nginx/controller   "/usr/bin/dumb-init  "   3 days ago   Up 3 days             k8s_controller_ingress-nginx-controller-5b74bc9868-bczdr_ingress-nginx_4d7d3d81-a769-4de9-a4fb-04763b7c1605_0
6669168db70d   k8s.gcr.io/pause:3.4.1                "/pause"                 3 days ago   Up 3 days             k8s_POD_ingress-nginx-controller-5b74bc9868-bczdr_ingress-nginx_4d7d3d81-a769-4de9-a4fb-04763b7c1605_0
7cbb177bee18   k8s.gcr.io/echoserver                 "/usr/local/bin/run. "   3 days ago   Up 3 days             k8s_echoserver_hello-node-59bffcc9fd-8hkgb_default_c7111c9e-7131-40e0-876d-be89d5ca1812_0
62b369de8d8c   k8s.gcr.io/pause:3.4.1                "/pause"                 3 days ago   Up 3 days             k8s_POD_hello-node-59bffcc9fd-8hkgb_default_c7111c9e-7131-40e0-876d-be89d5ca1812_0
649a507d4583   296a6d5035e2                          "/coredns -conf /etc "   4 days ago   Up 4 days             k8s_coredns_coredns-558bd4d5db-flrfq_kube-system_f8b2b52e-6673-4966-82b1-3fbe052a0297_0
4a30785f9187   296a6d5035e2                          "/coredns -conf /etc "   4 days ago   Up 4 days             k8s_coredns_coredns-558bd4d5db-4nvrg_kube-system_1976f4d6-647c-45ca-b268-95f071f064d5_0
9ffd6b668ddf   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_coredns-558bd4d5db-flrfq_kube-system_f8b2b52e-6673-4966-82b1-3fbe052a0297_0
534c0a698478   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_coredns-558bd4d5db-4nvrg_kube-system_1976f4d6-647c-45ca-b268-95f071f064d5_0
36b418e69ae7   df29c0a4002c                          "/home/weave/launch. "   4 days ago   Up 4 days             k8s_weave_weave-net-mchmg_kube-system_b9af9615-8cde-4a18-8555-6da1f51b7136_1
48d735f7f44e   weaveworks/weave-npc                  "/usr/bin/launch.sh"     4 days ago   Up 4 days             k8s_weave-npc_weave-net-mchmg_kube-system_b9af9615-8cde-4a18-8555-6da1f51b7136_0
7104f65b5d92   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_weave-net-mchmg_kube-system_b9af9615-8cde-4a18-8555-6da1f51b7136_0
26d92a720c56   4359e752b596                          "/usr/local/bin/kube "   4 days ago   Up 4 days             k8s_kube-proxy_kube-proxy-6d8kg_kube-system_8bf2d7ec-4850-427f-860f-465a9ff84841_0
73fae81715b6   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_kube-proxy-6d8kg_kube-system_8bf2d7ec-4850-427f-860f-465a9ff84841_0
89f35bf7a825   771ffcf9ca63                          "kube-apiserver --ad "   4 days ago   Up 4 days             k8s_kube-apiserver_kube-apiserver-udon_kube-system_1af8c5f362b7b02269f4d244cb0e6fbf_0
afa9798c9f66   a4183b88f6e6                          "kube-scheduler --au "   4 days ago   Up 4 days             k8s_kube-scheduler_kube-scheduler-udon_kube-system_629dc49dfd9f7446eb681f1dcffe6d74_0
2dabff6e4f59   0369cf4303ff                          "etcd --advertise-cl "   4 days ago   Up 4 days             k8s_etcd_etcd-udon_kube-system_c2a3008c1d9895f171cd394e38656ea0_0
4b3708b62f4d   e16544fd47b0                          "kube-controller-man "   4 days ago   Up 4 days             k8s_kube-controller-manager_kube-controller-manager-udon_kube-system_1d1b9018c3c6e7aa2e803c6e9ccd2eab_0
fd95c597ff31   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_kube-scheduler-udon_kube-system_629dc49dfd9f7446eb681f1dcffe6d74_0
589c1545d9e0   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_kube-controller-manager-udon_kube-system_1d1b9018c3c6e7aa2e803c6e9ccd2eab_0
6f417fd8a8c5   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_kube-apiserver-udon_kube-system_1af8c5f362b7b02269f4d244cb0e6fbf_0
c2ff2c50f0bc   k8s.gcr.io/pause:3.4.1                "/pause"                 4 days ago   Up 4 days             k8s_POD_etcd-udon_kube-system_c2a3008c1d9895f171cd394e38656ea0_0
Ok, that s interesting. Before we dig into it, what does Kubernetes say? (I ve trimmed the RESTARTS + AGE columns to make things fit a bit better here; they weren t interesting).
noodles@udon:~$ kubectl get pods --all-namespaces
NAMESPACE       NAME                                        READY   STATUS
default         hello-node-59bffcc9fd-8hkgb                 1/1     Running
ingress-nginx   ingress-nginx-admission-create-8jgkt        0/1     Completed
ingress-nginx   ingress-nginx-admission-patch-jdq4t         0/1     Completed
ingress-nginx   ingress-nginx-controller-5b74bc9868-bczdr   1/1     Running
kube-system     coredns-558bd4d5db-4nvrg                    1/1     Running
kube-system     coredns-558bd4d5db-flrfq                    1/1     Running
kube-system     etcd-udon                                   1/1     Running
kube-system     kube-apiserver-udon                         1/1     Running
kube-system     kube-controller-manager-udon                1/1     Running
kube-system     kube-proxy-6d8kg                            1/1     Running
kube-system     kube-scheduler-udon                         1/1     Running
kube-system     weave-net-mchmg                             2/2     Running
So there are a lot more Docker instances running than Kubernetes pods. What s happening there? Well, it turns out that Kubernetes builds pods from multiple different Docker instances. If you think of a traditional container as being comprised of a set of namespaces (process, network, hostname etc) and a cgroup then a pod is made up of the namespaces and then each docker instance within that pod has it s own cgroup. Ian Lewis has a much deeper discussion in What are Kubernetes Pods Anyway?, but my takeaway is that a pod is a set of sort-of containers that are coupled. We can see this more clearly if we ask systemd for the cgroup breakdown:
systemd-cgls
Control group /:
-.slice
 user.slice 
   user-0.slice 
     session-29.scope 
        515899 sshd: root@pts/1
        515913 -bash
       3519743 systemd-cgls
       3519744 cat
     user@0.service  
       init.scope 
         515902 /lib/systemd/systemd --user
         515903 (sd-pam)
   user-1000.slice 
     user@1000.service  
       init.scope 
         2564011 /lib/systemd/systemd --user
         2564012 (sd-pam)
     session-110.scope 
       2564007 sshd: noodles [priv]
       2564040 sshd: noodles@pts/0
       2564041 -bash
 init.scope 
   1 /sbin/init
 system.slice 
   containerd.service  
       21383 /usr/bin/containerd-shim-runc-v2 -namespace moby -id fd95c597ff31 
       21408 /usr/bin/containerd-shim-runc-v2 -namespace moby -id c2ff2c50f0bc 
       21432 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 589c1545d9e0 
       21459 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6f417fd8a8c5 
       21582 /usr/bin/containerd-shim-runc-v2 -namespace moby -id afa9798c9f66 
       21607 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 4b3708b62f4d 
       21640 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 89f35bf7a825 
       21648 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 2dabff6e4f59 
       22343 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 73fae81715b6 
       22391 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 26d92a720c56 
       26992 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 7104f65b5d92 
       27405 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 48d735f7f44e 
       27531 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 36b418e69ae7 
       27941 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 534c0a698478 
       27960 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 9ffd6b668ddf 
       28131 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 4a30785f9187 
       28159 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 649a507d4583 
      514667 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 62b369de8d8c 
      514976 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 7cbb177bee18 
      698904 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6669168db70d 
      699284 /usr/bin/containerd-shim-runc-v2 -namespace moby -id d5fa78fa31f1 
     2805479 /usr/bin/containerd
   systemd-udevd.service 
     2805502 /lib/systemd/systemd-udevd
   cron.service 
     2805474 /usr/sbin/cron -f
   docker.service  
     528 /usr/sbin/dockerd -H fd://
   kubelet.service 
     2805501 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap 
   systemd-journald.service 
     2805505 /lib/systemd/systemd-journald
   ssh.service 
     2805500 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
   ifup@enx00e04c6851de.service 
     2805675 /sbin/dhclient -4 -v -i -pf /run/dhclient.enx00e04c6851de.pid -lf 
   rsyslog.service 
     2805488 /usr/sbin/rsyslogd -n -iNONE
   smartmontools.service 
     2805499 /usr/sbin/smartd -n
   dbus.service 
     527 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile 
   systemd-timesyncd.service 
     2805513 /lib/systemd/systemd-timesyncd
   system-getty.slice 
     getty@tty1.service 
       536 /sbin/agetty -o -p -- \u --noclear tty1 linux
   systemd-logind.service 
     533 /lib/systemd/systemd-logind
 kubepods.slice 
   kubepods-burstable.slice 
     kubepods-burstable-pod1af8c5f362b7b02269f4d244cb0e6fbf.slice 
       docker-6f417fd8a8c573a2b8f792af08cdcd7ce663457f0f7218c8d55afa3732e6ee94.scope  
         21493 /pause
       docker-89f35bf7a825eb97db7035d29aa475a3a1c8aaccda0860a46388a3a923cd10bc.scope  
         21699 kube-apiserver --advertise-address=192.168.33.147 --allow-privi 
     kubepods-burstable-podf8b2b52e_6673_4966_82b1_3fbe052a0297.slice 
       docker-649a507d45831aca1de5231b49afc8ff37d90add813e7ecd451d12eedd785b0c.scope  
         28187 /coredns -conf /etc/coredns/Corefile
       docker-9ffd6b668ddfbf3c64c6783bc6f4f6cc9e92bfb16c83fb214c2cbb4044993bf0.scope  
         27987 /pause
     kubepods-burstable-podc2a3008c1d9895f171cd394e38656ea0.slice 
       docker-c2ff2c50f0bc052feda2281741c4f37df7905e3b819294ec645148ae13c3fe1b.scope  
         21481 /pause
       docker-2dabff6e4f59c96d931d95781d28314065b46d0e6f07f8c65dc52aa465f69456.scope  
         21701 etcd --advertise-client-urls=https://192.168.33.147:2379 --cert 
     kubepods-burstable-pod629dc49dfd9f7446eb681f1dcffe6d74.slice 
       docker-fd95c597ff3171ff110b7bf440229e76c5108d5d93be75ffeab54869df734413.scope  
         21491 /pause
       docker-afa9798c9f663b21df8f38d9634469e6b4db0984124547cd472a7789c61ef752.scope  
         21680 kube-scheduler --authentication-kubeconfig=/etc/kubernetes/sche 
     kubepods-burstable-podb9af9615_8cde_4a18_8555_6da1f51b7136.slice 
       docker-48d735f7f44e3944851563f03f32c60811f81409e7378641404035dffd8c1eb4.scope  
         27424 /usr/bin/weave-npc
         27458 /usr/sbin/ulogd -v
       docker-36b418e69ae7076fe5a44d16cef223d8908016474cb65910f2fd54cca470566b.scope  
         27549 /bin/sh /home/weave/launch.sh
         27629 /home/weave/weaver --port=6783 --datapath=datapath --name=12:82 
         27825 /home/weave/kube-utils -run-reclaim-daemon -node-name=udon -pee 
       docker-7104f65b5d92a56a2df93514ed0a78cfd1090ca47b6ce4e0badc43be6c6c538e.scope  
         27011 /pause
     kubepods-burstable-pod4d7d3d81_a769_4de9_a4fb_04763b7c1605.slice 
       docker-6669168db70db4e6c741e8a047942af06dd745fae4d594291d1d6e1077b05082.scope  
         698925 /pause
       docker-d5fa78fa31f11a4c5fb9fd2e853a00f0e60e414a7bce2e0d8fcd1f6ab2b30074.scope  
          699303 /usr/bin/dumb-init -- /nginx-ingress-controller --publish-ser 
          699316 /nginx-ingress-controller --publish-service=ingress-nginx/ing 
          699405 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/ngi 
         1075085 nginx: worker process
         1075086 nginx: worker process
         1075087 nginx: worker process
         1075088 nginx: worker process
         1075089 nginx: cache manager process
     kubepods-burstable-pod1976f4d6_647c_45ca_b268_95f071f064d5.slice 
       docker-4a30785f91873a7e6a191e86928a789760a054e4fa6dcd7048a059b42cf19edf.scope  
         28178 /coredns -conf /etc/coredns/Corefile
       docker-534c0a698478599277482d97a137fab8ef4d62db8a8a5cf011b4bead28246f70.scope  
         27995 /pause
     kubepods-burstable-pod1d1b9018c3c6e7aa2e803c6e9ccd2eab.slice 
       docker-589c1545d9e0cdf8ea391745c54c8f4db49f5f437b1a2e448e7744b2c12f8856.scope  
         21489 /pause
       docker-4b3708b62f4d427690f5979848c59fce522dab6c62a9c53b806ffbaef3f88e62.scope  
         21690 kube-controller-manager --authentication-kubeconfig=/etc/kubern 
   kubepods-besteffort.slice 
     kubepods-besteffort-podc7111c9e_7131_40e0_876d_be89d5ca1812.slice 
       docker-62b369de8d8cece4d33ec9fda4d23a9718379a8df8b30173d68f20bff830fed2.scope  
         514688 /pause
       docker-7cbb177bee18dbdeed21fb90e74378e2081436ad5bf116b36ad5077fe382df30.scope  
         514999 /bin/bash /usr/local/bin/run.sh
         515039 nginx: master process nginx -g daemon off;
         515040 nginx: worker process
     kubepods-besteffort-pod8bf2d7ec_4850_427f_860f_465a9ff84841.slice 
       docker-73fae81715b670255b66419a7959798b287be7bbb41e96f8b711fa529aa02f0d.scope  
         22364 /pause
       docker-26d92a720c560caaa5f8a0217bc98e486b1c032af6c7c5d75df508021d462878.scope  
         22412 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.c 
Again, there s a lot going on here, but if you look for the kubepods.slice piece then you can see our pods are divided into two sets, kubepods-burstable.slice and kubepods-besteffort.slice. Under those you can see the individual pods, all of which have at least 2 separate cgroups, one of which is running /pause. Turns out this is a generic Kubernetes image which basically performs the process reaping that an init process would do on a normal system; it just sits and waits for processes to exit and cleans them up. Again, Ian Lewis has more details on the pause container. Finally let s dig into the actual containers. The pause container seems like a good place to start. We can examine the details of where the filesystem is (may differ if you re not using the overlay2 image thingy). The hex string is the container ID listed by docker ps.
# docker inspect --format=' .GraphDriver.Data.MergedDir ' 6669168db70d
/var/lib/docker/overlay2/5a2d76012476349e6b58eb6a279bac400968cefae8537082ea873b2e791ff3c6/merged
# cd /var/lib/docker/overlay2/5a2d76012476349e6b58eb6a279bac400968cefae8537082ea873b2e791ff3c6/merged
# find .   sed -e 's;^./;;'
pause
proc
.dockerenv
etc
etc/resolv.conf
etc/hostname
etc/mtab
etc/hosts
sys
dev
dev/shm
dev/pts
dev/console
# file pause
pause: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=d35dab7152881e37373d819f6864cd43c0124a65, stripped
This is a nice, minimal container. The pause binary is statically linked, so there are no extra libraries required and it s just a basic set of support devices and files. I doubt the pieces in /etc are even required. Let s try the echoserver next:
# docker inspect --format=' .GraphDriver.Data.MergedDir ' 7cbb177bee18
/var/lib/docker/overlay2/09042bc1aff16a9cba43f1a6a68f7786c4748e989a60833ec7417837c4bfaacb/merged
# cd /var/lib/docker/overlay2/09042bc1aff16a9cba43f1a6a68f7786c4748e989a60833ec7417837c4bfaacb/merged
# find .   wc -l
3358
Wow. That s a lot more stuff. Poking /etc/os-release shows why:
# grep PRETTY etc/os-release
PRETTY_NAME="Ubuntu 16.04.2 LTS"
Aha. It s an Ubuntu-based image. We can cut straight to the chase with the nginx ingress container:
# docker exec d5fa78fa31f1 grep PRETTY /etc/os-release
PRETTY_NAME="Alpine Linux v3.13"
That s a bit more reasonable an image for a container; Alpine Linux is a much smaller distro. I don t feel there s a lot more poking to do here. It s not something I d expect to do on a normal Kubernetes setup, but I wanted to dig under the hood to make sure it really was just a normal container situation. I think the next steps involve adding a bit more complexity - that means building a pod with more than a single node, and then running an application that s a bit more complicated. That should help explore two major advantages of running this sort of setup; resilency from a node dying, and the ability to scale out beyond what a single node can do.

28 May 2021

Jonathan McDowell: Trying to understand Kubernetes networking

I previously built a single node Kubernetes cluster as a test environment to learn more about it. The first thing I want to try to understand is its networking. In particular the IP addresses that are listed are all 10.* and my host s network is a 192.168/24. I understand each pod gets its own virtual ethernet interface and associated IP address, and these are generally private within the cluster (and firewalled out other than for exposed services). What does that actually look like?
$ ip route
default via 192.168.53.1 dev enx00e04c6851de
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 dev weave proto kernel scope link src 192.168.0.1
192.168.53.0/24 dev enx00e04c6851de proto kernel scope link src 192.168.53.147
Huh. No sign of any way to get to 10.107.66.138 (the IP my echoserver from the previous post is available on directly from the host). What about network interfaces? (under the cut because it s lengthy)
ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enx00e04c6851de: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:e0:4c:68:51:de brd ff:ff:ff:ff:ff:ff
    inet 192.168.53.147/24 brd 192.168.53.255 scope global dynamic enx00e04c6851de
       valid_lft 41571sec preferred_lft 41571sec
3: wlp1s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 74:d8:3e:70:3b:18 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:18:04:9e:08 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether d2:5a:fd:c1:56:23 brd ff:ff:ff:ff:ff:ff
7: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    link/ether 12:82:8f:ed:c7:bf brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/24 brd 192.168.0.255 scope global weave
       valid_lft forever preferred_lft forever
9: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP group default
    link/ether b6:49:88:d6:6d:84 brd ff:ff:ff:ff:ff:ff
10: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
    link/ether 6e:6c:03:1d:e5:0e brd ff:ff:ff:ff:ff:ff
11: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue master datapath state UNKNOWN group default qlen 1000
    link/ether 9a:af:c5:0a:b3:fd brd ff:ff:ff:ff:ff:ff
13: vethwepl534c0a6@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
    link/ether 1e:ac:f1:85:61:9a brd ff:ff:ff:ff:ff:ff link-netnsid 0
15: vethwepl9ffd6b6@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
    link/ether 56:ca:71:2a:ab:39 brd ff:ff:ff:ff:ff:ff link-netnsid 1
17: vethwepl62b369d@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
    link/ether e2:a0:bb:ee:fc:73 brd ff:ff:ff:ff:ff:ff link-netnsid 2
23: vethwepl6669168@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
    link/ether f2:e7:e6:95:e0:61 brd ff:ff:ff:ff:ff:ff link-netnsid 3
That looks like a collection of virtual ethernet devices that are being managed by the weave networking plugin, and presumably partnered inside each pod. They re bridged to the weave interface (the master weave bit). Still no clues about the 10.* range. What about ARP?
ip neigh
192.168.53.1 dev enx00e04c6851de lladdr e4:8d:8c:35:98:d5 DELAY
192.168.0.4 dev datapath lladdr da:22:06:96:50:cb STALE
192.168.0.2 dev weave lladdr 66:eb:ce:16:3c:62 REACHABLE
192.168.53.136 dev enx00e04c6851de lladdr 00:e0:4c:39:f2:54 REACHABLE
192.168.0.6 dev weave lladdr 56:a9:f0:d2:9e:f3 STALE
192.168.0.3 dev datapath lladdr f2:42:c9:c3:08:71 STALE
192.168.0.3 dev weave lladdr f2:42:c9:c3:08:71 REACHABLE
192.168.0.2 dev datapath lladdr 66:eb:ce:16:3c:62 STALE
192.168.0.6 dev datapath lladdr 56:a9:f0:d2:9e:f3 STALE
192.168.0.4 dev weave lladdr da:22:06:96:50:cb STALE
192.168.0.5 dev datapath lladdr fe:6f:1b:14:56:5a STALE
192.168.0.5 dev weave lladdr fe:6f:1b:14:56:5a REACHABLE
Nope. That just looks like addresses on the weave managed bridge. Alright. What about firewalling?
nft list ruleset
table ip nat  
	chain DOCKER  
		iifname "docker0" counter packets 0 bytes 0 return
	 
	chain POSTROUTING  
		type nat hook postrouting priority srcnat; policy accept;
		 counter packets 531750 bytes 31913539 jump KUBE-POSTROUTING
		oifname != "docker0" ip saddr 172.17.0.0/16 counter packets 1 bytes 84 masquerade 
		counter packets 525600 bytes 31544134 jump WEAVE
	 
	chain PREROUTING  
		type nat hook prerouting priority dstnat; policy accept;
		 counter packets 180 bytes 12525 jump KUBE-SERVICES
		fib daddr type local counter packets 23 bytes 1380 jump DOCKER
	 
	chain OUTPUT  
		type nat hook output priority -100; policy accept;
		 counter packets 527005 bytes 31628455 jump KUBE-SERVICES
		ip daddr != 127.0.0.0/8 fib daddr type local counter packets 285425 bytes 17125524 jump DOCKER
	 
	chain KUBE-MARK-DROP  
		counter packets 0 bytes 0 meta mark set mark or 0x8000 
	 
	chain KUBE-MARK-MASQ  
		counter packets 0 bytes 0 meta mark set mark or 0x4000 
	 
	chain KUBE-POSTROUTING  
		mark and 0x4000 != 0x4000 counter packets 4622 bytes 277720 return
		counter packets 0 bytes 0 meta mark set mark xor 0x4000 
		 counter packets 0 bytes 0 masquerade 
	 
	chain KUBE-KUBELET-CANARY  
	 
	chain INPUT  
		type nat hook input priority 100; policy accept;
	 
	chain KUBE-PROXY-CANARY  
	 
	chain KUBE-SERVICES  
		meta l4proto tcp ip daddr 10.96.0.10  tcp dport 9153 counter packets 0 bytes 0 jump KUBE-SVC-JD5MR3NA4I4DYORP
		meta l4proto tcp ip daddr 10.107.66.138  tcp dport 8080 counter packets 1 bytes 60 jump KUBE-SVC-666FUMINWJLRRQPD
		meta l4proto tcp ip daddr 10.111.16.129  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-EZYNCFY2F7N6OQA2
		meta l4proto tcp ip daddr 10.96.9.41  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-EDNDUDH2C75GIR6O
		meta l4proto tcp ip daddr 192.168.53.147  tcp dport 443 counter packets 0 bytes 0 jump KUBE-XLB-EDNDUDH2C75GIR6O
		meta l4proto tcp ip daddr 10.96.9.41  tcp dport 80 counter packets 0 bytes 0 jump KUBE-SVC-CG5I4G2RS3ZVWGLK
		meta l4proto tcp ip daddr 192.168.53.147  tcp dport 80 counter packets 0 bytes 0 jump KUBE-XLB-CG5I4G2RS3ZVWGLK
		meta l4proto tcp ip daddr 10.96.0.1  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-NPX46M4PTMTKRN6Y
		meta l4proto udp ip daddr 10.96.0.10  udp dport 53 counter packets 0 bytes 0 jump KUBE-SVC-TCOU7JCQXEZGVUNU
		meta l4proto tcp ip daddr 10.96.0.10  tcp dport 53 counter packets 0 bytes 0 jump KUBE-SVC-ERIFXISQEP7F7OF4
		 fib daddr type local counter packets 3312 bytes 198720 jump KUBE-NODEPORTS
	 
	chain KUBE-NODEPORTS  
		meta l4proto tcp  tcp dport 31529 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp  tcp dport 31529 counter packets 0 bytes 0 jump KUBE-SVC-666FUMINWJLRRQPD
		meta l4proto tcp ip saddr 127.0.0.0/8  tcp dport 30894 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp  tcp dport 30894 counter packets 0 bytes 0 jump KUBE-XLB-EDNDUDH2C75GIR6O
		meta l4proto tcp ip saddr 127.0.0.0/8  tcp dport 32740 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp  tcp dport 32740 counter packets 0 bytes 0 jump KUBE-XLB-CG5I4G2RS3ZVWGLK
	 
	chain KUBE-SVC-NPX46M4PTMTKRN6Y  
		 counter packets 0 bytes 0 jump KUBE-SEP-Y6PHKONXBG3JINP2
	 
	chain KUBE-SEP-Y6PHKONXBG3JINP2  
		ip saddr 192.168.53.147  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.53.147:6443
	 
	chain WEAVE  
		# match-set weaver-no-masq-local dst  counter packets 135966 bytes 8160820 return
		ip saddr 192.168.0.0/24 ip daddr 224.0.0.0/4 counter packets 0 bytes 0 return
		ip saddr != 192.168.0.0/24 ip daddr 192.168.0.0/24 counter packets 0 bytes 0 masquerade 
		ip saddr 192.168.0.0/24 ip daddr != 192.168.0.0/24 counter packets 33 bytes 2941 masquerade 
	 
	chain WEAVE-CANARY  
	 
	chain KUBE-SVC-JD5MR3NA4I4DYORP  
		  counter packets 0 bytes 0 jump KUBE-SEP-6JI23ZDEH4VLR5EN
		 counter packets 0 bytes 0 jump KUBE-SEP-FATPLMAF37ZNQP5P
	 
	chain KUBE-SEP-6JI23ZDEH4VLR5EN  
		ip saddr 192.168.0.2  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.0.2:9153
	 
	chain KUBE-SVC-TCOU7JCQXEZGVUNU  
		  counter packets 0 bytes 0 jump KUBE-SEP-JTN4UBVS7OG5RONX
		 counter packets 0 bytes 0 jump KUBE-SEP-4TCKAEJ6POVEFPVW
	 
	chain KUBE-SEP-JTN4UBVS7OG5RONX  
		ip saddr 192.168.0.2  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto udp   counter packets 0 bytes 0 dnat to 192.168.0.2:53
	 
	chain KUBE-SVC-ERIFXISQEP7F7OF4  
		  counter packets 0 bytes 0 jump KUBE-SEP-UPZX2EM3TRFH2ASL
		 counter packets 0 bytes 0 jump KUBE-SEP-KPHYKKPVMB473Z76
	 
	chain KUBE-SEP-UPZX2EM3TRFH2ASL  
		ip saddr 192.168.0.2  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.0.2:53
	 
	chain KUBE-SEP-4TCKAEJ6POVEFPVW  
		ip saddr 192.168.0.3  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto udp   counter packets 0 bytes 0 dnat to 192.168.0.3:53
	 
	chain KUBE-SEP-KPHYKKPVMB473Z76  
		ip saddr 192.168.0.3  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.0.3:53
	 
	chain KUBE-SEP-FATPLMAF37ZNQP5P  
		ip saddr 192.168.0.3  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.0.3:9153
	 
	chain KUBE-SVC-666FUMINWJLRRQPD  
		 counter packets 1 bytes 60 jump KUBE-SEP-LYLDBZYLHY4MT3AQ
	 
	chain KUBE-SEP-LYLDBZYLHY4MT3AQ  
		ip saddr 192.168.0.4  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 1 bytes 60 dnat to 192.168.0.4:8080
	 
	chain KUBE-XLB-EDNDUDH2C75GIR6O  
		 fib saddr type local counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		 fib saddr type local counter packets 0 bytes 0 jump KUBE-SVC-EDNDUDH2C75GIR6O
		 counter packets 0 bytes 0 jump KUBE-SEP-BLQHCYCSXY3NRKLC
	 
	chain KUBE-XLB-CG5I4G2RS3ZVWGLK  
		 fib saddr type local counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		 fib saddr type local counter packets 0 bytes 0 jump KUBE-SVC-CG5I4G2RS3ZVWGLK
		 counter packets 0 bytes 0 jump KUBE-SEP-5XVRKWM672JGTWXH
	 
	chain KUBE-SVC-EDNDUDH2C75GIR6O  
		 counter packets 0 bytes 0 jump KUBE-SEP-BLQHCYCSXY3NRKLC
	 
	chain KUBE-SEP-BLQHCYCSXY3NRKLC  
		ip saddr 192.168.0.5  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.0.5:443
	 
	chain KUBE-SVC-CG5I4G2RS3ZVWGLK  
		 counter packets 0 bytes 0 jump KUBE-SEP-5XVRKWM672JGTWXH
	 
	chain KUBE-SEP-5XVRKWM672JGTWXH  
		ip saddr 192.168.0.5  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.0.5:80
	 
	chain KUBE-SVC-EZYNCFY2F7N6OQA2  
		 counter packets 0 bytes 0 jump KUBE-SEP-JYW326XAJ4KK7QPG
	 
	chain KUBE-SEP-JYW326XAJ4KK7QPG  
		ip saddr 192.168.0.5  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
		meta l4proto tcp   counter packets 0 bytes 0 dnat to 192.168.0.5:8443
	 
 
table ip filter  
	chain DOCKER  
	 
	chain DOCKER-ISOLATION-STAGE-1  
		iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-2
		counter packets 0 bytes 0 return
	 
	chain DOCKER-ISOLATION-STAGE-2  
		oifname "docker0" counter packets 0 bytes 0 drop
		counter packets 0 bytes 0 return
	 
	chain FORWARD  
		type filter hook forward priority filter; policy drop;
		iifname "weave"  counter packets 213 bytes 54014 jump WEAVE-NPC-EGRESS
		oifname "weave"  counter packets 150 bytes 30038 jump WEAVE-NPC
		oifname "weave" ct state new counter packets 0 bytes 0 log group 86 
		oifname "weave" counter packets 0 bytes 0 drop
		iifname "weave" oifname != "weave" counter packets 33 bytes 2941 accept
		oifname "weave" ct state related,established counter packets 0 bytes 0 accept
		 counter packets 0 bytes 0 jump KUBE-FORWARD
		ct state new  counter packets 0 bytes 0 jump KUBE-SERVICES
		ct state new  counter packets 0 bytes 0 jump KUBE-EXTERNAL-SERVICES
		counter packets 0 bytes 0 jump DOCKER-USER
		counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-1
		oifname "docker0" ct state related,established counter packets 0 bytes 0 accept
		oifname "docker0" counter packets 0 bytes 0 jump DOCKER
		iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 accept
		iifname "docker0" oifname "docker0" counter packets 0 bytes 0 accept
	 
	chain DOCKER-USER  
		counter packets 0 bytes 0 return
	 
	chain KUBE-FIREWALL  
		 mark and 0x8000 == 0x8000 counter packets 0 bytes 0 drop
		ip saddr != 127.0.0.0/8 ip daddr 127.0.0.0/8  ct status dnat counter packets 0 bytes 0 drop
	 
	chain OUTPUT  
		type filter hook output priority filter; policy accept;
		ct state new  counter packets 527014 bytes 31628984 jump KUBE-SERVICES
		counter packets 36324809 bytes 6021214027 jump KUBE-FIREWALL
		meta l4proto != esp  mark and 0x20000 == 0x20000 counter packets 0 bytes 0 drop
	 
	chain INPUT  
		type filter hook input priority filter; policy accept;
		 counter packets 35869492 bytes 5971008896 jump KUBE-NODEPORTS
		ct state new  counter packets 390938 bytes 23457377 jump KUBE-EXTERNAL-SERVICES
		counter packets 36249774 bytes 6030068622 jump KUBE-FIREWALL
		meta l4proto tcp ip daddr 127.0.0.1 tcp dport 6784 fib saddr type != local ct state != related,established  counter packets 0 bytes 0 drop
		iifname "weave" counter packets 907273 bytes 88697229 jump WEAVE-NPC-EGRESS
		counter packets 34809601 bytes 5818213726 jump WEAVE-IPSEC-IN
	 
	chain KUBE-KUBELET-CANARY  
	 
	chain KUBE-PROXY-CANARY  
	 
	chain KUBE-EXTERNAL-SERVICES  
	 
	chain KUBE-NODEPORTS  
		meta l4proto tcp  tcp dport 32196 counter packets 0 bytes 0 accept
		meta l4proto tcp  tcp dport 32196 counter packets 0 bytes 0 accept
	 
	chain KUBE-SERVICES  
	 
	chain KUBE-FORWARD  
		ct state invalid counter packets 0 bytes 0 drop
		 mark and 0x4000 == 0x4000 counter packets 0 bytes 0 accept
		 ct state related,established counter packets 0 bytes 0 accept
		 ct state related,established counter packets 0 bytes 0 accept
	 
	chain WEAVE-NPC-INGRESS  
	 
	chain WEAVE-NPC-DEFAULT  
		# match-set weave-;rGqyMIl1HN^cfDki~Z$3]6!N dst  counter packets 14 bytes 840 accept
		# match-set weave-P.B !ZhkAr5q=XZ?3 tMBA+0 dst  counter packets 0 bytes 0 accept
		# match-set weave-Rzff h:=]JaaJl/G;(XJpGjZ[ dst  counter packets 0 bytes 0 accept
		# match-set weave-]B*(W?)t*z5O17G044[gUo#$l dst  counter packets 0 bytes 0 accept
		# match-set weave-iLgO^ o=U/*%KE[@=W:l~ 9T dst  counter packets 9 bytes 540 accept
	 
	chain WEAVE-NPC  
		ct state related,established counter packets 124 bytes 28478 accept
		ip daddr 224.0.0.0/4 counter packets 0 bytes 0 accept
		# PHYSDEV match --physdev-out vethwe-bridge --physdev-is-bridged counter packets 3 bytes 180 accept
		ct state new counter packets 23 bytes 1380 jump WEAVE-NPC-DEFAULT
		ct state new counter packets 0 bytes 0 jump WEAVE-NPC-INGRESS
	 
	chain WEAVE-NPC-EGRESS-ACCEPT  
		counter packets 48 bytes 3769 meta mark set mark or 0x40000 
	 
	chain WEAVE-NPC-EGRESS-CUSTOM  
	 
	chain WEAVE-NPC-EGRESS-DEFAULT  
		# match-set weave-s_+ChJId4Uy_$ G;WdH ~TK)I src  counter packets 0 bytes 0 jump WEAVE-NPC-EGRESS-ACCEPT
		# match-set weave-s_+ChJId4Uy_$ G;WdH ~TK)I src  counter packets 0 bytes 0 return
		# match-set weave-E1ney4o[ojNrLk.6rOHi;7MPE src  counter packets 31 bytes 2749 jump WEAVE-NPC-EGRESS-ACCEPT
		# match-set weave-E1ney4o[ojNrLk.6rOHi;7MPE src  counter packets 31 bytes 2749 return
		# match-set weave-41s)5vQ^o/xWGz6a20N:~?# E src  counter packets 0 bytes 0 jump WEAVE-NPC-EGRESS-ACCEPT
		# match-set weave-41s)5vQ^o/xWGz6a20N:~?# E src  counter packets 0 bytes 0 return
		# match-set weave-sui%__gZ kX~oZgI_Ttqp=Dp src  counter packets 0 bytes 0 jump WEAVE-NPC-EGRESS-ACCEPT
		# match-set weave-sui%__gZ kX~oZgI_Ttqp=Dp src  counter packets 0 bytes 0 return
		# match-set weave-nmMUaDKV*YkQcP5s?Q[R54Ep3 src  counter packets 17 bytes 1020 jump WEAVE-NPC-EGRESS-ACCEPT
		# match-set weave-nmMUaDKV*YkQcP5s?Q[R54Ep3 src  counter packets 17 bytes 1020 return
	 
	chain WEAVE-NPC-EGRESS  
		ct state related,established counter packets 907425 bytes 88746642 accept
		# PHYSDEV match --physdev-in vethwe-bridge --physdev-is-bridged counter packets 0 bytes 0 return
		fib daddr type local counter packets 11 bytes 640 return
		ip daddr 224.0.0.0/4 counter packets 0 bytes 0 return
		ct state new counter packets 50 bytes 3961 jump WEAVE-NPC-EGRESS-DEFAULT
		ct state new mark and 0x40000 != 0x40000 counter packets 2 bytes 192 jump WEAVE-NPC-EGRESS-CUSTOM
	 
	chain WEAVE-IPSEC-IN  
	 
	chain WEAVE-CANARY  
	 
 
table ip mangle  
	chain KUBE-KUBELET-CANARY  
	 
	chain PREROUTING  
		type filter hook prerouting priority mangle; policy accept;
	 
	chain INPUT  
		type filter hook input priority mangle; policy accept;
		counter packets 35716863 bytes 5906910315 jump WEAVE-IPSEC-IN
	 
	chain FORWARD  
		type filter hook forward priority mangle; policy accept;
	 
	chain OUTPUT  
		type route hook output priority mangle; policy accept;
		counter packets 35804064 bytes 5938944956 jump WEAVE-IPSEC-OUT
	 
	chain POSTROUTING  
		type filter hook postrouting priority mangle; policy accept;
	 
	chain KUBE-PROXY-CANARY  
	 
	chain WEAVE-IPSEC-IN  
	 
	chain WEAVE-IPSEC-IN-MARK  
		counter packets 0 bytes 0 meta mark set mark or 0x20000
	 
	chain WEAVE-IPSEC-OUT  
	 
	chain WEAVE-IPSEC-OUT-MARK  
		counter packets 0 bytes 0 meta mark set mark or 0x20000
	 
	chain WEAVE-CANARY  
	 
 
Wow. That s a lot of nftables entries, but it explains what s going on. We have a nat entry for:
meta l4proto tcp ip daddr 10.107.66.138 tcp dport 8080 counter packets 1 bytes 60 jump KUBE-SVC-666FUMINWJLRRQPD
which ends up going to KUBE-SEP-LYLDBZYLHY4MT3AQ and:
meta l4proto tcp counter packets 1 bytes 60 dnat to 192.168.0.4:8080
So packets headed for our echoserver are eventually ending up in a container that has a local IP address of 192.168.0.4. Which we can see in our routing table via the weave interface. Mystery explained. We can see the ingress for the externally visible HTTP service as well:
meta l4proto tcp ip daddr 192.168.33.147 tcp dport 80 counter packets 0 bytes 0 jump KUBE-XLB-CG5I4G2RS3ZVWGLK
which ends up redirected to:
meta l4proto tcp counter packets 0 bytes 0 dnat to 192.168.0.5:80
So from that we d expect the IP inside the echoserver pod to be 192.168.0.4 and the IP address instead our nginx ingress pod to be 192.168.0.5. Let s look:
root@udon:/# docker ps   grep echoserver
7cbb177bee18   k8s.gcr.io/echoserver                 "/usr/local/bin/run. "   3 days ago   Up 3 days             k8s_echoserver_hello-node-59bffcc9fd-8hkgb_default_c7111c9e-7131-40e0-876d-be89d5ca1812_0
root@udon:/# docker exec -it 7cbb177bee18 /bin/bash
root@hello-node-59bffcc9fd-8hkgb:/# awk '/32 host/   print f    f=$2 ' <<< "$(</proc/net/fib_trie)"   sort -u
127.0.0.1
192.168.0.4
It s a slightly awkward method of determining the local IPs addresses due to the stripped down nature of the container, but it clearly shows the expected 192.168.0.4 address. I ve touched here upon the ability to actually enter a container and have a poke around its running environment by using docker directly. Next step is to use that to investigate what containers have actually been spun up and what they re doing. I ll also revisit networking when I get to the point of building a multi-node cluster, to examine how the bridging between different hosts is done.

20 May 2021

Jonathan McDowell: Losing control to Kubernetes

GMK NucBox Kubernetes is about giving up control. As someone who likes to understand what s going on that s made it hard for me to embrace it. I ve also mostly been able to ignore it, which has helped. However I m aware it s incredibly popular, and there s some infrastructure at work that uses it. While it s not my responsibility I always find having an actual implementation of something is useful in understanding it generally, so I decided it was time to dig in and learn something new. First up, I should say I understand the trade-off here about handing a bunch of decisions off to Kubernetes about the underlying platform allowing development/deployment to concentrate on a nice consistent environment. I get the analogy with the shipping container model where you can abstract out both sides knowing all you have to do is conform to the TEU API. In terms of the underlying concepts I ve got some virtualisation and container experience, so I m not coming at this as a complete newcomer. And I understand multi-site dynamically routed networks. That said, let s start with a basic goal. I d like to understand k8s (see, I can be cool and use the short name) enough to be comfortable with what s going on under the hood and be able to examine a running instance safely (i.e. enough confidence about pulling logs, probing state etc without fearing I might modify state). That ll mean when I come across such infrastructure I have enough tools to be able to hopefully learn from it. To do this I figure I ll need to build myself a cluster and deploy some things on it, then poke it. I ll start by doing so on bare metal; that removes variables around cloud providers and virtualisation and gives me an environment I know is isolated from everything else. I happen to have a GMK NucBox available, so I ll use that. As a first step I m aiming to get a single node cluster deployed running some sort of web accessible service that is visible from the rest of my network. That should mean I ve covered the basics of a Kubernetes install, a running service and actually making it accessible. Of course I m running Debian. I ve got a Bullseye (Debian 11) install - not yet released as stable, but in freeze and therefore not a moving target. I wanted to use packages from Debian as much as possible but it seems that the bits of Kubernetes available in main are mostly just building blocks and not a great starting point for someone new to Kubernetes. So to do the initial install I did the following:
# Install docker + nftables from Debian
apt install docker.io nftables
# Add the Kubernetes repo and signing key
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg > /etc/apt/k8s.gpg
cat > /etc/apt/sources.list.d/kubernetes.list <<EOF
deb [signed-by=/etc/apt/k8s.gpg] https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt update
apt install kubelet kubeadm kubectl
That resulted in a 1.21.1-00 install, which is current at the time of writing. I then used kubeadm to create the cluster:
kubeadm init --apiserver-advertise-address 192.168.53.147 --apiserver-cert-extra-sans udon.mynetwork
The extra parameters were to make the API server externally accessible from the host. I don t know if that was a good idea or not at this stage kubeadm spat out a bunch of instructions but the key piece was about copying the credentials to my user account. So I did:
mkdir ~noodles/.kube
cp -i /etc/kubernetes/admin.conf ~noodles/.kube/config
chown -R noodles ~noodles/.kube/
I then was able to see my pod:
noodles@udon:~$ kubectl get nodes
NAME   STATUS     ROLES                  AGE     VERSION
udon   NotReady   control-plane,master   4m31s   v1.21.1
Ooooh. But why s it NotReady? Seems like it s a networking issue and I need to install a networking provider. The documentation on this is appalling. Flannel gets recommended as a simple option but then turns out to need a --pod-network-cidr option passed to kubeadm and I didn t feel like cleaning up and running again (I ve omitted all the false starts it took me to get to this point). Another pointer was to Weave so I decided to try that with the following magic runes:
mkdir -p /var/lib/weave
head -c 16 /dev/urandom   shasum -a 256   cut -d " " -f1 > /var/lib/weave/weave-passwd
kubectl create secret -n kube-system generic weave-passwd --from-file=/var/lib/weave/weave-passwd
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version   base64   tr -d '\n')&password-secret=weave-passwd&env.IPALLOC_RANGE=192.168.0.0/24"
(I believe what that s doing is the first 3 lines create a password and store it into the internal Kubernetes config so the weave pod can retrieve it. The final line then grabs a YAML config from Weaveworks to configure up weave. My intention is to delve deeper into what s going on here later; for now the primary purpose is to get up and running.) As I m running a single node cluster I then had to untaint my control node so I could use it as a worker node too:
kubectl taint nodes --all node-role.kubernetes.io/master-
And then:
noodles@udon:~$ kubectl get nodes
NAME   STATUS   ROLES                  AGE   VERSION
udon   Ready    control-plane,master   15m   v1.21.1
Result. What s actually running? Nothing except the actual system stuff, so we need to ask for all namespaces:
noodles@udon:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                           READY   STATUS    RESTARTS   AGE
kube-system   coredns-558bd4d5db-4nvrg       1/1     Running   0          18m
kube-system   coredns-558bd4d5db-flrfq       1/1     Running   0          18m
kube-system   etcd-udon                      1/1     Running   0          18m
kube-system   kube-apiserver-udon            1/1     Running   0          18m
kube-system   kube-controller-manager-udon   1/1     Running   0          18m
kube-system   kube-proxy-6d8kg               1/1     Running   0          18m
kube-system   kube-scheduler-udon            1/1     Running   0          18m
kube-system   weave-net-mchmg                2/2     Running   1          3m26s
These are all things I m going to have to learn about, but for now I ll nod and smile and pretend I understand. Now I want to actually deploy something to the cluster. I ended up with a simple HTTP echoserver (though it s not entirely clear that s actually the source for what I ended up pulling):
$ kubectl create deployment hello-node --image=k8s.gcr.io/echoserver:1.10
deployment.apps/hello-node created
$ kubectl get pod
NAME                          READY   STATUS    RESTARTS   AGE
hello-node-59bffcc9fd-8hkgb   1/1     Running   0          36s
$ kubectl expose deployment hello-node --type=NodePort --port=8080
$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hello-node   NodePort    10.107.66.138   <none>        8080:31529/TCP   1m
Looks good. And to test locally:
curl http://10.107.66.138:8080/

Hostname: hello-node-59bffcc9fd-8hkgb
Pod Information:
	-no pod information available-
Server values:
	server_version=nginx: 1.13.3 - lua: 10008
Request Information:
	client_address=192.168.53.147
	method=GET
	real path=/
	query=
	request_version=1.1
	request_scheme=http
	request_uri=http://10.107.66.138:8080/
Request Headers:
	accept=*/*
	host=10.107.66.138:8080
	user-agent=curl/7.74.0
Request Body:
	-no body in request-
Neat. But my external network is 192.168.53.0/24 and that s a 10.* address so how do I actually make it visible to other hosts? What I seem to need is an Ingress Controller which provide some sort of proxy between the outside world and pods within the cluster. Let s pick nginx because at least I have some vague familiarity with that and it seems like it should be able to do a bunch of HTTP redirection to different pods depending on the incoming request.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.46.0/deploy/static/provider/cloud/deploy.yaml
I then want to expose the hello-node to the outside world and I finally had to write some YAML:
cat > hello-ingress.yaml <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
  rules:
    - host: udon.mynetwork
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: hello-node
                port:
                  number: 8080
EOF
i.e. incoming requests to http://udon.mynetwork/ should go to the hello-node on port 8080. I applied this:
$ kubectl apply -f hello-ingress.yaml
ingress.networking.k8s.io/example-ingress created
$ kubectl get ingress
NAME              CLASS    HOSTS            ADDRESS   PORTS   AGE
example-ingress   <none>   udon.mynetwork             80      3m8s
No address? What have I missed? Let s check the nginx service, which apparently lives in the ingress-nginx namespace:
noodles@udon:~$ kubectl get services -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                    AGE
ingress-nginx-controller             LoadBalancer   10.96.9.41      <pending>     80:32740/TCP,443:30894/TCP 13h
ingress-nginx-controller-admission   ClusterIP      10.111.16.129   <none>        443/TCP                    13h
<pending> does not seem like something I want. Digging around it seems I need to configure the external IP. So I do:
kubectl patch svc ingress-nginx-controller -n ingress-nginx -p \
	' "spec":  "type": "LoadBalancer", "externalIPs":["192.168.53.147"] '
and things look happier:
noodles@udon:~$ kubectl get services -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                 AGE
ingress-nginx-controller             LoadBalancer   10.96.9.41      192.168.53.147   80:32740/TCP,443:30894/TCP   14h
ingress-nginx-controller-admission   ClusterIP      10.111.16.129   <none>           443/TCP                 14h
noodles@udon:~$ kubectl get ingress
NAME              CLASS    HOSTS           ADDRESS          PORTS   AGE
example-ingress   <none>   udon.mynetwork  192.168.53.147   80      14h
Let s try a curl from a remote host:
curl http://udon.mynetwork/

Hostname: hello-node-59bffcc9fd-8hkgb
Pod Information:
	-no pod information available-
Server values:
	server_version=nginx: 1.13.3 - lua: 10008
Request Information:
	client_address=192.168.0.5
	method=GET
	real path=/
	query=
	request_version=1.1
	request_scheme=http
	request_uri=http://udon.mynetwork:8080/
Request Headers:
	accept=*/*
	host=udon.mynetwork
	user-agent=curl/7.64.0
	x-forwarded-for=192.168.53.136
	x-forwarded-host=udon.mynetwork
	x-forwarded-port=80
	x-forwarded-proto=http
	x-real-ip=192.168.53.136
	x-request-id=6aaef8feaaa4c7d07c60b2d05c45f75c
	x-scheme=http
Request Body:
	-no body in request-
Ok, so that seems like success. I ve got a single node cluster running a single actual application pod (the echoserver) and exporting it to the outside world. That s enough to start poking under the hood. Which is for another post, as this one is already getting longer than I d like. I ll just leave some final thoughts of things I need to work out:

Next.

Previous.